crsh / citr

RStudio Addin to Insert Markdown Citations
Other
407 stars 46 forks source link

citr creates incorrect bib-file when there are diacritics #53

Open ghost opened 5 years ago

ghost commented 5 years ago

I have an issue with diacritics in the bib-file that citr creates from my Zotero references. I run Zotero and connect R to it directly. When I want to knit a pdf with a bib-file that includes diacritics, R throws the following error:

! Undefined control sequence.
l.93 (Ã…\nobreakspaceirok
                         ý et al. 2011) 

Error: Failed to compile citr-test.tex. See https://yihui.name/tinytex/r/#debugging for debugging tips. See citr-test.log for more info.
Execution halted

However, when I paste the citation from a manual export, with unicode as plain-text latex commands, to the bib-file created by citr, the pdf is created perfectly.


The paper I use in this case can be found here, but it is the case with any publication with diacritics.


Here is the reference created by manual export:

@article{siroky2011,
  title = {Life Cycle of Tortoise Tick {{Hyalomma}} Aegyptium under Laboratory Conditions},
  volume = {54},
  issn = {01688162},
  doi = {10.1007/s10493-011-9442-8},
  abstract = {The tortoise tick Hyalomma aegyptium has a typical three-host life-cycle. Whereas its larvae and nymphs are less host-specific feeding on a variety of tetrapods, tortoises of the genus Testudo are principal hosts of adults. Ticks retained this trait also in our study under laboratory conditions, while adults were reluctant to feed on mammalian hosts. Combination of feeding larvae and nymphs on guinea pigs and feeding of adults on Testudo marginata tortoises provided the best results. Feeding period of females was on average 25~days (range 17-44), whereas males remain after female engorgement on tortoise host. Female pre-oviposition period was 14~days (3-31), followed by 24~days of oviposition (18-29). Pre-eclosion and eclosion, both together, takes 31~days (21-43). Larvae fed 5~days (3-9), then molted to nymphs after 17~days (12-23). Feeding period of nymphs lasted 7~days (5-10), engorged nymphs molted to adults after 24~days (19-26). Sex ratio of laboratory hatched H. aegyptium was nearly equal (1:1.09). The average weight of engorged female was 0.95 (0.72-1.12)~g. The average number of laid eggs was 6,900 (6,524-7,532) per female, it was significantly correlated with weight of engorged female. Only 2.8\% of engorged larvae and 1.8\% of engorged nymphs remained un-molted and died. Despite the use of natural host species, feeding success of females reached only 45\%. The whole life-cycle was completed within 147~days (98-215).},
  number = {3},
  journaltitle = {Experimental and Applied Acarology},
  date = {2011},
  pages = {277-284},
  keywords = {_not_in_zettelkasten},
  author = {{\v S}irok\'y, Pavel and Erhart, Jan and Petr{\v z}elkov\'a, Kl\'ara J. and Kamler, Martin},
  file = {C\:\\Users\\Raoul\\Zotero\\storage\\6US6QFQF\\Široký et al. - 2011 - Life cycle of tortoise tick Hyalomma aegyptium under laboratory conditions.pdf},
  isbn = {0168-8162},
  eprinttype = {pmid},
  eprint = {21431927}

Here is the reference created by citr:

@Article{siroky2011,
  title = {Life Cycle of Tortoise Tick {{Hyalomma}} Aegyptium under Laboratory Conditions},
  volume = {54},
  issn = {01688162},
  abstract = {The tortoise tick Hyalomma aegyptium has a typical three-host life-cycle. Whereas its larvae and nymphs are less host-specific feeding on a variety of tetrapods, tortoises of the genus Testudo are principal hosts of adults. Ticks retained this trait also in our study under laboratory conditions, while adults were reluctant to feed on mammalian hosts. Combination of feeding larvae and nymphs on guinea pigs and feeding of adults on Testudo marginata tortoises provided the best results. Feeding period of females was on average 25~days (range 17-44), whereas males remain after female engorgement on tortoise host. Female pre-oviposition period was 14~days (3-31), followed by 24~days of oviposition (18-29). Pre-eclosion and eclosion, both together, takes 31~days (21-43). Larvae fed 5~days (3-9), then molted to nymphs after 17~days (12-23). Feeding period of nymphs lasted 7~days (5-10), engorged nymphs molted to adults after 24~days (19-26). Sex ratio of laboratory hatched H. aegyptium was nearly equal (1:1.09). The average weight of engorged female was 0.95 (0.72-1.12)~g. The average number of laid eggs was 6,900 (6,524-7,532) per female, it was significantly correlated with weight of engorged female. Only 2.8\% of engorged larvae and 1.8\% of engorged nymphs remained un-molted and died. Despite the use of natural host species, feeding success of females reached only 45\%. The whole life-cycle was completed within 147~days (98-215).},
  number = {3},
  journal = {Experimental and Applied Acarology},
  doi = {10.1007/s10493-011-9442-8},
  author = {Pavel {{\r A}{\nobreakspace}irok\'y} and Jan Erhart and Kl{\~A}{\textexclamdown}ra J. Petr{\v z}elkov{\a'a} and Martin Kamler},
  year = {2011},
  keywords = {_not_in_zettelkasten},
  pages = {277-284},
  file = {C\:\\Users\\Raoul\\Zotero\\storage\\6US6QFQF\\Široký et al. - 2011 - Life cycle of tortoise tick Hyalomma aegyptium under laboratory conditions.pdf},
  pmid = {21431927},
}

As you can see, the latter contains a lot of extra characters, that even citr itself does not understand.

I would like to know how I can resolve this, because I like to create a new bib-file for each project I create, which then only contains the references used in that project.

crsh commented 4 years ago

Hi @raoul-van-oosten, sorry for the late reply. I suspect this problem may be introduced by RefManageR::ReadBib(), which citr uses under the hood. Could you try reading the bib-entry with that function to see whether that already yields the same problem? If not, try writing the imported file the the disk with RefManageR::WriteBib(). I think you may be seeing the same problems there. If so, this issue would best be raised with the maintainers of that package.

ghost commented 4 years ago

Hi @crsh, thanks for your reply. I am currently getting the following error when I try to connect citr to Zotero:

> citr:::insert_citation()
Loading required package: shiny

Listening on http://127.0.0.1:7905
Warning: Error in do_read_bib: lex fatal error:
input buffer overflow, can't enlarge buffer because scanner uses REJECT

  127: do_read_bib
  126: RefManageR::ReadBib
  125: import_bbt
  124: load_betterbiblatex_bib
  120: <reactive>
  104: bibliography
   98: renderText
   97: func
   84: origRenderFunc
   83: output$read_error
    3: shiny::runApp
    2: runGadget
    1: citr:::insert_citation
Warning: Error in do_read_bib: lex fatal error:
input buffer overflow, can't enlarge buffer because scanner uses REJECT

  49: <Anonymous>
Warning: Error in do_read_bib: lex fatal error:
input buffer overflow, can't enlarge buffer because scanner uses REJECT

  49: <Anonymous>

By doing so, each time I run it RStudio creates a new .bib file with a cryptic name with my entire library in it. It did not do so before. When I change bibliography: references.bib to this file (like myZGIaUWrWhgLPPmP1POclSPqnCrN9Mg.bib), the references are inserted correctly. For instance, [@creach1997] has as first author {Cr{\'e}ach, V., which is transcribed correctly. But the file that I export manually from Zoter uses Cr\'each, V., which I prefer.

However, it is inconvenient and strange that the whole library is added, that a cryptic name is given and that using references.bib does not work. I have not yet figured out what that means or how to fix it. It seems to be this issue so I will have a look there.

When I run RefManageR::ReadBib() I get something similar:

> RefManageR::ReadBib("references.bib")
Error in do_read_bib(file, encoding = .Encoding, srcfile) : 
  lex fatal error:
fatal flex scanner internal error--end of buffer missed
crsh commented 4 years ago

Could you try restarting the R session? This is a know problem with bibtex (see here and here).

To simplify things for now, let's leave Zotero out of this. Could you just create a new bib-file with the problematic entry you give above and try to read it with ReadBib()?

ghost commented 4 years ago

Restarting did not fix it (I actually rebooted my whole machine, too).

I have created a bib-file with siroky2011 in three formats (a: manual export; b: previous automatic import of citr via Zotero; c: current import by citr via Zotero), and run RefManageR::ReadBib(). This shows info on all three versions, with the one citr imports from Zotero being different. Knitting to pdf works with a and c but not b. Please find details below.

I see a problem with your suggested approach. When I create a bib-file myself, the problematic entries are not problematic. But when I let citr create it from Zotero, the entries become problematic. So we can try and fix the way citr handles the problematic entries, but that is not the way I want them to show up in the bib-file.


Bib-file with siroky2011 in three formats. siroky2011a is the manual export and the way I would like it to be. siroky2011b is the incorrect export by citr previously, and siroky2011c is what it does currently (by creating the entire library).

@article{siroky2011a,
  title = {Life Cycle of Tortoise Tick {{Hyalomma}} Aegyptium under Laboratory Conditions},
  volume = {54},
  issn = {01688162},
  doi = {10.1007/s10493-011-9442-8},
  abstract = {The tortoise tick Hyalomma aegyptium has a typical three-host life-cycle. Whereas its larvae and nymphs are less host-specific feeding on a variety of tetrapods, tortoises of the genus Testudo are principal hosts of adults. Ticks retained this trait also in our study under laboratory conditions, while adults were reluctant to feed on mammalian hosts. Combination of feeding larvae and nymphs on guinea pigs and feeding of adults on Testudo marginata tortoises provided the best results. Feeding period of females was on average 25~days (range 17-44), whereas males remain after female engorgement on tortoise host. Female pre-oviposition period was 14~days (3-31), followed by 24~days of oviposition (18-29). Pre-eclosion and eclosion, both together, takes 31~days (21-43). Larvae fed 5~days (3-9), then molted to nymphs after 17~days (12-23). Feeding period of nymphs lasted 7~days (5-10), engorged nymphs molted to adults after 24~days (19-26). Sex ratio of laboratory hatched H. aegyptium was nearly equal (1:1.09). The average weight of engorged female was 0.95 (0.72-1.12)~g. The average number of laid eggs was 6,900 (6,524-7,532) per female, it was significantly correlated with weight of engorged female. Only 2.8\% of engorged larvae and 1.8\% of engorged nymphs remained un-molted and died. Despite the use of natural host species, feeding success of females reached only 45\%. The whole life-cycle was completed within 147~days (98-215).},
  number = {3},
  journaltitle = {Experimental and Applied Acarology},
  date = {2011},
  pages = {277-284},
  keywords = {_not_in_zettelkasten},
  author = {{\v S}irok\'y, Pavel and Erhart, Jan and Petr{\v z}elkov\'a, Kl\'ara J. and Kamler, Martin},
  file = {C\:\\Users\\Raoul\\Zotero\\storage\\6US6QFQF\\Široký et al. - 2011 - Life cycle of tortoise tick Hyalomma aegyptium under laboratory conditions.pdf},
  isbn = {0168-8162},
  eprinttype = {pmid},
  eprint = {21431927}
}

@Article{siroky2011b,
  title = {Life Cycle of Tortoise Tick {{Hyalomma}} Aegyptium under Laboratory Conditions},
  volume = {54},
  issn = {01688162},
  abstract = {The tortoise tick Hyalomma aegyptium has a typical three-host life-cycle. Whereas its larvae and nymphs are less host-specific feeding on a variety of tetrapods, tortoises of the genus Testudo are principal hosts of adults. Ticks retained this trait also in our study under laboratory conditions, while adults were reluctant to feed on mammalian hosts. Combination of feeding larvae and nymphs on guinea pigs and feeding of adults on Testudo marginata tortoises provided the best results. Feeding period of females was on average 25~days (range 17-44), whereas males remain after female engorgement on tortoise host. Female pre-oviposition period was 14~days (3-31), followed by 24~days of oviposition (18-29). Pre-eclosion and eclosion, both together, takes 31~days (21-43). Larvae fed 5~days (3-9), then molted to nymphs after 17~days (12-23). Feeding period of nymphs lasted 7~days (5-10), engorged nymphs molted to adults after 24~days (19-26). Sex ratio of laboratory hatched H. aegyptium was nearly equal (1:1.09). The average weight of engorged female was 0.95 (0.72-1.12)~g. The average number of laid eggs was 6,900 (6,524-7,532) per female, it was significantly correlated with weight of engorged female. Only 2.8\% of engorged larvae and 1.8\% of engorged nymphs remained un-molted and died. Despite the use of natural host species, feeding success of females reached only 45\%. The whole life-cycle was completed within 147~days (98-215).},
  number = {3},
  journal = {Experimental and Applied Acarology},
  doi = {10.1007/s10493-011-9442-8},
  author = {Pavel {{\r A}{\nobreakspace}irok\'y} and Jan Erhart and Kl{\~A}{\textexclamdown}ra J. Petr{\v z}elkov{\a'a} and Martin Kamler},
  year = {2011},
  keywords = {_not_in_zettelkasten},
  pages = {277-284},
  file = {C\:\\Users\\Raoul\\Zotero\\storage\\6US6QFQF\\Široký et al. - 2011 - Life cycle of tortoise tick Hyalomma aegyptium under laboratory conditions.pdf},
  pmid = {21431927},
}

@article{siroky2011c,
  title = {Life Cycle of Tortoise Tick {{Hyalomma}} Aegyptium under Laboratory Conditions},
  volume = {54},
  issn = {01688162},
  abstract = {The tortoise tick Hyalomma aegyptium has a typical three-host life-cycle. Whereas its larvae and nymphs are less host-specific feeding on a variety of tetrapods, tortoises of the genus Testudo are principal hosts of adults. Ticks retained this trait also in our study under laboratory conditions, while adults were reluctant to feed on mammalian hosts. Combination of feeding larvae and nymphs on guinea pigs and feeding of adults on Testudo marginata tortoises provided the best results. Feeding period of females was on average 25~days (range 17-44), whereas males remain after female engorgement on tortoise host. Female pre-oviposition period was 14~days (3-31), followed by 24~days of oviposition (18-29). Pre-eclosion and eclosion, both together, takes 31~days (21-43). Larvae fed 5~days (3-9), then molted to nymphs after 17~days (12-23). Feeding period of nymphs lasted 7~days (5-10), engorged nymphs molted to adults after 24~days (19-26). Sex ratio of laboratory hatched H. aegyptium was nearly equal (1:1.09). The average weight of engorged female was 0.95 (0.72-1.12)~g. The average number of laid eggs was 6,900 (6,524-7,532) per female, it was significantly correlated with weight of engorged female. Only 2.8\% of engorged larvae and 1.8\% of engorged nymphs remained un-molted and died. Despite the use of natural host species, feeding success of females reached only 45\%. The whole life-cycle was completed within 147~days (98-215).},
  number = {3},
  journal = {Experimental and Applied Acarology},
  doi = {10.1007/s10493-011-9442-8},
  author = {{\v S}irok{\'y}, Pavel and Erhart, Jan and Petr{\v z}elkov{\'a}, Kl{\'a}ra J. and Kamler, Martin},
  year = {2011},
  keywords = {_not_in_zettelkasten},
  pages = {277-284},
  file = {C\:\\Users\\Raoul\\Zotero\\storage\\6US6QFQF\\Široký et al. - 2011 - Life cycle of tortoise tick Hyalomma aegyptium under laboratory conditions.pdf},
  pmid = {21431927}
}

Knitting of a and c creates perfectly fine citations, but b gives the problem as before:

processing file: citr-test.Rmd
"C:/PROGRA~1/Pandoc/pandoc" +RTS -K512m -RTS citr-test.utf8.md --to latex --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash --output citr-test.tex --template "D:\User folders\Raoul\Documents\R\win-library\3.6\rmarkdown\rmd\latex\default-1.17.0.2.tex" --highlight-style tango --pdf-engine pdflatex --variable graphics=yes --variable "geometry:margin=1in" --variable "compact-title:yes" --filter pandoc-citeproc 
output file: citr-test.knit.md

! Undefined control sequence.
l.95 (Ã…\nobreakspaceirok
                         ý et al. 2011) 

Error: Failed to compile citr-test.tex. See https://yihui.name/tinytex/r/#debugging for debugging tips. See citr-test.log for more info.
Execution halted

Running RefManageR::ReadBib() gives the following:

> RefManageR::ReadBib("references.bib")
[1] P. Å irok\'y, J. Erhart, K. J. Petrželková, et al. “Life Cycle of Tortoise Tick Hyalomma Aegyptium under Laboratory Conditions”.
In: _Experimental and Applied Acarology_ 54.3 (2011), pp. 277-284. ISSN: 01688162. DOI: 10.1007/s10493-011-9442-8. pmid: 21431927.

[2] P. Å irok\'y, J. Erhart, K. J. Petrželková, et al. “Life Cycle of Tortoise Tick Hyalomma Aegyptium under Laboratory Conditions”.
In: _Experimental and Applied Acarology_ 54.3 (2011), pp. 277-284. ISSN: 01688162. DOI: 10.1007/s10493-011-9442-8.

[3] P. Å irok\'y, J. Erhart, K. J. P. á, et al. “Life Cycle of Tortoise Tick Hyalomma Aegyptium under Laboratory Conditions”. In:
_Experimental and Applied Acarology_ 54.3 (2011), pp. 277-284. ISSN: 01688162. DOI: 10.1007/s10493-011-9442-8.

In this case, it seems only [3] is different, and that seems to be siroky2011b (when I use citr to insert from the bibliography, this one is different).

crsh commented 4 years ago

Restarting did not fix it (I actually rebooted my whole machine, too).

Restarting will only make sure that it could work in principle. Once you get that error, it will make loading any bib-file impossible (problematic or not) until you restart.

I have created a bib-file with siroky2011 in three formats (a: manual export; b: previous automatic import of citr via Zotero; c: current import by citr via Zotero)

The point here would be to first simply read in the correct bib-file and see if that gives the same character conversion problem as we are seeing in citr. The output for all three files suggests that the character that is causing LaTeX to error (Å) seems to be similarly present after reading in those files. As elaborated here you can compare the imported characters for ReadBib() and readLines() to see whether the conversions done by ReadBib() are responsible (which I strongy suspect they are). If you find this to be the case, I have way to fix this problem and you need to bring it to the attention of the RefManageR maintainers.

ghost commented 4 years ago

Oke, it turns out a new citation with a gigantic summary in my Zotero library was the culprit. After removing it and restarting RStudio, citr is able to connect to Zotero as usual. The same problem occurs as before.

ghost commented 4 years ago

Indeed the characters get messed up. It seems Š is created with {\v S}. Manual export from Zotero does this correctly, but citr + Zotero create {{\r A}{\nobreakspace}irok\'y}. Indeed {\r A} creates Å, but the whole name is inserted in curly braces, too. That trips up the knit.

But you are right that RefManageR::ReadBib() shows P. Å, too, so it goes wrong there too. The citr plugin already shows this:

screenshot_1571223740

So you suggest me to contact the RefManageR maintainers?

crsh commented 4 years ago

Yes, I don't think I can fix this on the citr end. Thank you for helping to track down the source of the problem.