Open blegat opened 1 day ago
I’ll look into this at some point.
It seems like the zotero-better-bibtex plugin has an option to keep Unicode.
You should absolutely 100% enable that. I’m actually really confused about the statement in their README
Unfortunately, for those shackled to BibTeX and who cannot (yet) move to BibLaTeX, unicode is a major PITA.
I have all my .bib
files in Unicode, and I’m using plain BibTeX, not BibLaTeX. It has “just worked” for the last 15 years (maybe since pdflatex started to exist?). As far as I can tell, it’s just not a problem anymore, and nobody should use these tex escapes anymore.
Good point, if I untick the checkbox "Export unicode as plain-text..." then I get rid of the errors. If I also select "in the 'url' field", below in the screenshot I also get rid of the warnings complaining that there is an "urldate" without a "url" because by default, "Add URLs to BibTeX export" was "No". So I think you can recommend Zotero users to use these settings.
I also tried BibLaTeX export but I got an error, see https://github.com/Humans-of-Julia/BibInternal.jl/issues/33
I got similar errors with DocumenterCitations v1.3.5
yesterday (things were fine with older versions). I was able to fix them by removing TeX syntax: https://github.com/CliMA/CloudMicrophysics.jl/pull/483
Thank you!
The reason things might have worked in v1.3.4 stopped working in v1.3.5 was that the solution to #78 was to try to convert latex to unicode before obtaining the initials for first names. That means first names are now processed, while they weren't before, and if there was anything in a first name that trips up the conversion, it breaks. I actually ran into that myself.
Ultimately, the bottom line is that DocumenterCitations requires Unicode. Any handling of LaTeX commands will always be an incomplete and heuristic fallback, and not officially supported.
For […]
author = {{\"U}nl{\"u}, {\c C}a{\u g}lar}
[…] I get […]Premature end of tex string: BoundsError("{\\c", 4)
This particular case seems to be a bug in Bibliography.jl
: https://github.com/Humans-of-Julia/BibParser.jl/issues/39
I also think that zotero-better-bibtex
isn't really using the "correct" escape sequences here. They should probably stick to the ones officially supported by BibTeX. For this example, that would be
@misc{Unlu2024,
title = {More issues with escaped unicode},
author = {\"{U}nl\"{u}, \c{C}a\u{g}lar},
year = {2024},
note = {Bug Report #85},
}
which works fine.
When I try
DocumenterCitations.tex_to_markdown(raw"{\"U}nl{\"u}, {\c C}a{\u g}lar")
, I get"\"Unl\"u, Çağlar"
, which seems indeed weird because \"u" is not replaced by the unicode character.
No, that's actually an issue with the raw string: Raw strings in Julia aren't quite as raw as one might think: quotes still have to be escaped, and then the escape has to be escaped. You'd have to write that as
@test tex_to_markdown(raw"{\\\"U}nl{\\\"u}, {\c C}a{\u g}lar") == "Ünlü, Çağlar"
which works.
@trontrytel
I got similar errors with DocumenterCitations v1.3.5 yesterday (things were fine with older versions). I was able to fix them by removing TeX syntax: https://github.com/CliMA/CloudMicrophysics.jl/pull/483
The only entry I can reproduce as failing is Lehtinen2007
, and that's failing due to the same bug in Bibliography
: https://github.com/Humans-of-Julia/BibParser.jl/issues/39#issuecomment-2480606573
Unfortunately, your "fix" of removing the braces is actually not correct: it changes the last name "Dal Maso" to "Maso" with "Dal" as a middle name. The correct way to handle this is to use the "Last, First" format.
@article{Lehtinen2007,
title = {Estimating nucleation rates from apparent particle formation rates and vice versa: Revised formulation of the Kerminen–Kulmala equation},
author = {Lehtinen, Kari E.J. and Dal Maso, Miikka and Kulmala, Markku and Kerminen, Veli-Matti},
journal = {Journal of Aerosol Science},
volume = {38},
number = {9},
pages = {988-994},
year = {2007},
doi = {10.1016/j.jaerosci.2007.06.009}
}
I strongly recommend always using that format (and to make sure that any automatic exporter uses it)
So this doesn't really seem actionable on my side, but I'll keep this issue open until https://github.com/Humans-of-Julia/BibParser.jl/issues/39 is resolved.
Meanwhile, there's some additional testing in b8c5de304741943a142afa3e1552d4d3995f5269.
Follow up from https://github.com/JuliaDocs/DocumenterCitations.jl/issues/78
With the script
For
I get
When I try
DocumenterCitations.tex_to_markdown(raw"{\"U}nl{\"u}, {\c C}a{\u g}lar")
, I get"\"Unl\"u, Çağlar"
, which seems indeed weird because\"u"
is not replaced by the unicode character.With
I get
but when I do
DocumenterCitations.tex_to_markdown(raw"Mikolov, Tom{\'a}{\v s}")
I get correctly"Mikolov, Tomáš"
.