Open maybemkl opened 3 years ago
Hi,
This is expected behaviour. When you change this setting, you are choosing between Obsidian's markdown features and Pandoc's markdown features. If you want Pandoc citations you have to give up Obsidian wiki-links, and vice versa. There's no easy way to get the best of both worlds unfortunately.
I realise this setting is poorly worded, so I will probably change it in the future.
Hi, thanks for responding. I understand the issue, but I was thinking it might still make sense to work around it. I wrote this very barebones python filter that removes the brackets upon export, which works as a patchy solution for now https://gist.github.com/maybemkl/d9be15bcabadaa19d2ca50c87b59a92e
Yes, that's a fair point - this is one case where it's easy to fix formatting, but I can't fix formatting in general. I'll take a look at this in the next release
Hello @OliverBalfour Thanks for your great plugin! I may have a workaround for having to choose between citations (using Markdown) or everything else (using html). I managed to get both by using pandoc twice:
Here is what my command line looks like:
cat EXPORTED_MARKDOWN_FILE.pandoc.md | sed 's/\\\[/\[/g; s/\\\]/\]/g; s/\\\@/\@/g' | pandoc -s --lua-filter zotero.lua -o DESTINATION_FILE.docx
(obviously you have to download the zotero.lua file for it to work)
Here what it looks like on a sample page :
Markdown page in Obsidian, with both Obsidian stuff and citations:
Resulting docx file:
I did not test it extensively yet, but for now it seems to work very well. I know it's kinda hacky but I've seen much hackier code before. If it can give you ideas.
Thanks again,
Felix
Hi @felixchenier your temporary proposal looks great, but I would be loosing footnotes - any idea for that ? I'm also looking to build a filter for Obsidian's ==highlight== syntax : any idea on how to build a python or whatever filter that would end up with a highlighted text in Word ? Thanks !
Hi @Limezy Unfortunately this is only a workaround, I didn't play in all that filter stuff and I'm not sure how my proposal could also apply to footnotes since I didn't try it. If the problem is the presence of an escaped hat in the output pandoc file, then you could change the sed command in my example to also remove escapes for hats:
cat EXPORTED_MARKDOWN_FILE.pandoc.md | sed 's/\\\[/\[/g; s/\\\]/\]/g; s/\\\@/\@/g; s/\\\^/\^/g' | pandoc -s --lua-filter zotero.lua -o DESTINATION_FILE.docx
But I didn't try it. And for highlights, it may be the same thing but for escaped equals:
cat EXPORTED_MARKDOWN_FILE.pandoc.md | sed 's/\\\[/\[/g; s/\\\]/\]/g; s/\\\@/\@/g; s/\\\^/\^/g; s/\\\=/\=/g' | pandoc -s --lua-filter zotero.lua -o DESTINATION_FILE.docx
But I also didn't try it. It may (will) break something, somewhere, at some moment, that's for sure!
Good luck
a usable lua-filter
function Str (str)
return (str.text
:gsub('%[%[', '')
:gsub('%]%]', ''))
end
Wow that's great @wenbopeng Perfectly working ! I'll try to adapt is using a regex to parse the [[xxx#yyy]] format
For whoever is interested, my current filter :
function Str (str)
return (str.text
:gsub('%[%[', '')
:gsub('%]%]', ''))
end
Now only waiting for the plugin to manage transclusions !
@Limezy
as your previous snippet is the same as wenbopeng's and you wanted to adapt that to handle the [[xxx#yyy]] format...
You've go another snipped, that does that by any chance?
Actually I am looking for some inspiration for how to deal with [[xxx^section|alias]] -> alias
filtering.
@bvorak and @Limezy I'm also looking for a filter of [[xxx^section|alias]] -> alias
and [[xxx#yyy]]
. Did you succeed?
@bvorak and @Limezy I'm also looking for a filter of
[[xxx^section|alias]] -> alias
and[[xxx#yyy]]
. Did you succeed?
https://regex101.com/r/jCiF1r/1 seems to solve most cases for me as of now. But there must be a more accessible way no? :D What it does not handle are links without an alias but with a section qualifier like [[xxx#section]].
#whitespace have to be ingnored by editor
(?|
\[\[(?:.[^\|\]]*)\|(.[^\]]*)\]\]
|
\[\[(.[^\|\#\]]*)\]\]
)
@bvorak @jankap my current lua filter is now a bit crazy.
It will detect the [[A#B#C|D]] syntax (because I also have a special [[A#B#C]] plugin but you can use it as is.
I still have one bug case, when you have the [[wikilink]].[[wikilink]]
syntax with only a dot between two links and no space.
It will replace "[[ANYTHING|D]]" by "D" It will replace "[[A]]" by "A" It will replace "[[A#B]]" by "A, B" It will replace "[[A#B#C]]" by "A, B-C"
"A", "B", "C" or "D" markdown markups will be converted to LaTeX equivalent markup. For example, "[[A#B#C|This is a bold word]]" will be converted to "This is a \emph{bold} word".
You can probably easily change these behaviours using my example as a starting point. ChatGPT may help you get sense of what's going on. It's not one, but two filters that you have to run one after the other :
Filter one
--[[
Add support for a custom inline syntax.
This pandoc Lua filter allows to add a custom markup syntax
extension. It is designed to be adjustable; it should not be
necessary to modify the code below the separator line.
The example here allows to add highlighted text by enclosing the
text with `==` on each side. Pandoc supports this for HTML output
out of the box. Other outputs will need additional filters.
Copyright: © 2022 Albert Krewinkel
License: MIT
]]
-- Lua pattern matching the opening markup string.
local opening = "%[%["
-- Lua pattern matching the closing markup string.
local closing = "%]%]"
-- Toggle whether the opening markup may be followed by whitespace.
local nospace = true
-- Function converting the enclosed inlines to their internal pandoc
-- representation.
local function markup_inlines(inlines)
local result = pandoc.Span(inlines)
result.attr = { class = "wikiLink" }
return result
end
------------------------------------------------------------------------
local function is_space(inline)
return inline and
(inline.t == "Space" or
inline.t == "LineBreak" or
inline.t == "SoftBreak" or
(inline.t == "Str" and inline.text:match("^%s*$")))
end
function Inlines(inlines)
local result = pandoc.Inlines{}
local markup = nil
local start = nil
for i, inline in ipairs(inlines) do
if inline.tag == "Str" then
if not markup then
local before, first = inline.text:match("(.-)" .. opening .. "(.*)")
if first then
start = inline -- keep element around in case the
-- markup is not closed. Check if the
-- closing pattern is already in this
-- string.
local selfclosing, after = first:match("(.-)" .. closing .. "(.*)")
if selfclosing then
result:insert(pandoc.Str(before))
result:insert(markup_inlines{ pandoc.Str(selfclosing) })
result:insert(pandoc.Str(after))
elseif nospace and first == "" and is_space(inlines[i + 1]) then
-- the opening pattern is followed by a space, but the
-- config disallows this.
result:insert(inline)
else
local target = first
local pipe = target:find("|")
local hashes = {}
for hash in target:gmatch("#") do
table.insert(hashes, hash)
end
local hashCount = #hashes
if pipe then
target = target:sub(pipe + 1)
elseif hashCount > 0 then
local sections = {}
local sectionCount = hashCount + 1
for section in target:gmatch("[^#]+") do
table.insert(sections, section)
end
if sectionCount == 2 then
target = table.concat(sections, ", ")
else
local firstSection = table.concat(sections, ", ", 1, sectionCount - 2)
local lastSection = sections[sectionCount - 1] .. "-" .. sections[sectionCount]
target = firstSection .. ", " .. lastSection
end
end
result:insert(pandoc.Str(before))
markup = pandoc.Inlines{ pandoc.Str(target) }
end
else
result:insert(inline)
end
else
local last, after = inline.text:match("(.-)" .. closing .. "(.*)")
if last then
markup:insert(pandoc.Str(last))
result:insert(markup_inlines(markup))
markup = nil
result:insert(pandoc.Str(after))
else
markup:insert(inline)
end
end
else
local acc = markup or result
acc:insert(inline)
end
end
-- keep unterminated markup
if markup then
markup:remove(1) -- the stripped-down first element
result:insert(start)
result:extend(markup)
end
return result
end
Filter two
function replaceHashMarks(text)
local hashCount = select(2, text:gsub("#", ""))
if hashCount == 1 then
return text:gsub("#", ", ")
elseif hashCount >= 2 then
return text:gsub("#", ", ", 1):gsub("#", "-", 1)
else
return text
end
end
function stringifyInline(inline)
if inline.t == "Str" then
return inline.text
elseif inline.t == "Emph" then
return "*" .. stringifyWithMarkup(inline.content) .. "*"
elseif inline.t == "Strong" then
return "**" .. stringifyWithMarkup(inline.content) .. "**"
elseif inline.t == "Code" then
return "`" .. inline.text .. "`"
elseif inline.t == "Link" then
return "[" .. stringifyWithMarkup(inline.content) .. "](" .. stringifyWithMarkup(inline.target) .. ")"
elseif inline.t == "Image" then
return "!" .. "[" .. stringifyWithMarkup(inline.content) .. "](" .. stringifyWithMarkup(inline.src) .. ")"
elseif inline.t == "Space" then
return " "
elseif inline.t == "SoftBreak" or inline.t == "LineBreak" then
return "\n"
elseif inline.t == "Subscript" then
return "~" .. stringifyWithMarkup(inline.content) .. "~"
elseif inline.t == "Superscript" then
return "^" .. stringifyWithMarkup(inline.content) .. "^"
elseif inline.t == "Strikethrough" then
return "~~" .. stringifyWithMarkup(inline.content) .. "~~"
else
local parts = {}
for _, elem in ipairs(inline.content) do
table.insert(parts, stringifyInline(elem))
end
return table.concat(parts)
end
end
function stringifyWithMarkup(content)
local output = ""
for _, inline in ipairs(content) do
local inlineString = stringifyInline(inline)
if inline.t == "Link" and #inline.content == 1 and inline.content[1].t == "Str" then
-- Handle wikilinks enclosed within parentheses
local linkText = inline.content[1].text
if linkText:sub(1, 1) == "(" and linkText:sub(-1) == ")" then
inlineString = "(" .. inlineString .. ")"
end
end
output = output .. inlineString
end
-- Remove curly braces
output = output:gsub("{", ""):gsub("}", "")
return output
end
function Span(span)
if span.classes:includes('wikiLink') then
local content = stringifyWithMarkup(span.content)
local modifiedContent = replaceHashMarks(content)
modifiedContent = modifiedContent:gsub('%[%[', ''):gsub('%]%]', '')
local pipeIndex = modifiedContent:find('|')
if pipeIndex then
modifiedContent = modifiedContent:sub(pipeIndex + 1)
end
-- Convert the modified content to LaTeX
local modifiedAst = pandoc.read(modifiedContent, "markdown-fancy_lists")
local latexContent = pandoc.write(modifiedAst, "latex")
-- Wrap latexContent in curly braces
latexContent = "{" .. latexContent .. "}"
local newAttributes = pandoc.Attr(span.identifier, span.classes:filter(function (c) return c ~= 'wikiLink' end), span.attributes)
return pandoc.RawInline("latex", latexContent)
end
end
return {
{ Span = Span }
}
Credits to https://github.com/tarleb for getting me started with the wikilink syntax detection
@Limezy how do you call it?
pandoc -L /data/tools/strip_wikilinks_1.lua -L /data/tools/strip_wikilinks_2.lua ...
seems not to work, there's still one bracket left.
Edit:
should the filters also support ![[image.png]]
pictures?
@Limezy thanks for the terrific work on creating a full Lua solution.
However, I found Albert's script structure quite long and hard to understand. I created a simplified alternative script preserving only the first two rules:
[[ANYTHING|D]]
with D
[[A]]
by A
I'm exploiting the fact that in Obsidian the link label can't have formatting and the only elements between the brackets are strings and spaces. Moreover, multiple spaces count as one, so you can suppress inline elements by replacing them with spaces.
I added some logic to remove the section anchors (^abc
) that would otherwise be rendered as text. Here some caveats:
\^
, so there is no way to differentiate between Obsidian anchors and similar verbatim string.!
before images and transclusions (but not withgfm
) so the script cannot differentiate between them and wikilinks-implicit_header_references
, otherwise in the unfortunate case where a wikilink is the same as a section title, it will be linked instead of being rendered as plain textHere's the script:
-- wikilinks.lua
-- remove wikilinks identifiers and replace them with the link text
function clean (text)
return text
:gsub("%[%[([^|]-)%]%]", "%1") -- remove simple wikilinks
:gsub("%[%[.-|(.-)%]%]", "%1") -- remove wikilinks with custom text
end
function Blocks(blocks)
for _,elem in pairs(blocks) do
if elem.t == "Para" then
local start = nil
for i, inline in ipairs(elem.content) do
if inline.tag == "Str" then
-- remove links identifiers
inline.text = inline.text:gsub("^%^%w+", "")
-- Pandoc always parse the escapes, so there is no way to tell
-- ^ and \^ apart
end
-- select range corresponding to wikilink and subsitute it
if inline.tag == "Str" and inline.text:match("%[%[") then
start = i
end
if inline.tag == "Str" and inline.text:match("%]%]") and start then
local result = elem.content[start].text
for j = start+1, i do
if elem.content[j].tag == "Str" then
result = result .. elem.content[j].text
else -- if it's not a string, it's a Space
result = result .. " "
end
elem.content[j] = pandoc.Space()
end
elem.content[start].text = clean(result)
start = nil
end
end
end
end
return blocks
end
Sorry to revive this old post, but since 2023/01/18 Pandoc offers 2 extensions, wikilinks_title_after_pipe
and wikilinks_title_before_pipe
to handle wiki links.
For example :
pandoc -f markdown -t pdf+wikilinks_title_after_pipe
When I changed the setting "Export files from HTML to Markdown" from HTML to Markdown, all the functionality for removing [[wiki-links]] formatting in the output PDF stops working.