ali-ramadhan / DocumenterCitations.jl

DocumenterCitations.jl uses Bibliography.jl to add support for BibTeX citations and references in documentation pages generated by Documenter.jl.
https://ali-ramadhan.github.io/DocumenterCitations.jl/dev
MIT License
65 stars 10 forks source link

Math mode data gets mangled #59

Open fingolfin opened 2 years ago

fingolfin commented 2 years ago

Right now we have this:

julia> DocumenterCitations.tex2unicode(raw"{$\overline{M}$}")
"{\$\\overlineM\$}"

But that's not right, it shouldn't strip those inner braces, as the result is invalid LaTeX and won't be rendered right.

This is caused by the following rule, I believe:

r"\{([[:alnum:]]+)\}" => s"\1",  # {<text>}     <text>  bracket stripping after applying all rules

Frankly I don't understand the purpose of this rule. Besides clearly being incorrect (in the sense that it changes semantics of the code, as seen in my example), it is quite unclear to me why one would want to do that, i.e. what the purpose of this substitution is? It was introduced by @LazyScholar in PR #49 but no explanation was given there.

LazyScholar commented 2 years ago

Sorry i have way too much on my hands right now. I saw the other issue and PR but choose not to participate bacause... well too much to do rn.

But i should at least try to explain my intention on that PR. I introduced that rule in order to strip out the grouping braces which are sometimes used in BibTex titles or publisher names (e.g. capital letter abbreviations) (see: BibTex.org : Text enclosed in braces ). I intended for it to be applied in the last step of substitutions.

But as you are stating, that example might be is doing it wrong. It depends on how you want to see the 'Text enclosed in braces'-Rule. I tried not to step into that (never ending 'have to fix that too') pitfall back then. One big problem (imo) with regular expression substitution rules are braces. You can not account for all cases with regex. And i did not want to implement a BibTex parser or even LaTeX parser. I was satisfied by adding some further cases well knowing that the package does not fully supports the syntax.

I just remembered that i had another PR prepared where i wanted to add r"\\\\\"\{\\i\}" => s"\u0069\u308", # \"{\i} ï Latin Small Letter I with Diaeresis link to the substitution rules. As i do not see any movement on the related PR in Documenter.jl you could ad that to the substitutions in case if you want to add LaTeX support.

Or you could use \={M} instead of $\overline{M}$?

fingolfin commented 2 years ago

@LazyScholar thank you for the explanation, now at least I understand the motivation and that helps finding alternatives.

I don't think removing {} blindly like this is easily salvageable, and I would suggest to remove it: as long as it is there, it always has a chance of wrecking user input, with no recourse open to the user. But if we instead don't remove it, then the user can easily deal with this, by tweaking the bib data suitably.

simonbyrne commented 2 years ago

Unfortunately, {} is pretty common in bibtex text, as its the desired way to enforce capitalization, e.g. https://tex.stackexchange.com/a/10775/245, which we would want to remove, so it's helpful to keep.

Honestly, at this point we probably need a proper TeX parser to do this properly. @Kolaru has a basic math-mode tex parser in MathTeXEngine.jl (https://github.com/Kolaru/MathTeXEngine.jl/tree/master/src/parser), it might be possible to adapt that to handle text mode as well.