crsh / citr

RStudio Addin to Insert Markdown Citations
Other
407 stars 46 forks source link

Encoding issue (not UTF-8) and repeated entries #67

Open GegznaV opened 4 years ago

GegznaV commented 4 years ago

Describe the bug 1) Encoding issue in displaying non-ASCII characters. 2) Repeated entries of the same source (in bib file they are entered only once)

To Reproduce Call citr RStudio add-in from the attached project: citr--UTF-8--bug.zip

Expected behavior 1) Correct encoding (UTF-8) for all characters. 2) Each entry is shown exactly once.

Screenshots image

Encoding is set to UTF-8 in settings: image

Additional context

R             3.6.3
RStudio       1.2.5033
citr          0.3.2
- Session info ----------------------------------
 setting  value                       
 version  R version 3.6.3 (2020-02-29)
 os       Windows 10 x64              
 system   x86_64, mingw32             
 ui       RStudio                     
 language (EN)                        
 collate  English_United States.1252  
 ctype    English_United States.1252  
 tz       Europe/Helsinki             
 date     2020-03-15    
GegznaV commented 4 years ago

Another encoding issue may be here:

https://github.com/crsh/citr/blob/0afd6f97b35b294c655fa5831fb9db3d818e70c4/R/insert_citation.R#L107

Shouldn't it be:

parent_document <- readLines(parents_path[parents], warn = FALSE, encoding = getOption("citr.encoding")) 
GegznaV commented 4 years ago

68 solves the issue of duplicated entries. It addresses one more potential issue related to encoding.

And the indicated encoding issue is related to RefManageR::ReadBib() which does not respect the value of encoding:

RefManageR::ReadBib("book.bib", check = FALSE, .Encoding = "UTF-8")

## [1] V. Čekanavičius and G. Murauskas. _Statistika ir jos taikymai I_. Vilnius:
## TEV, 2006, p. 240. ISBN: 9986-546-93-1.

##  / truncated /

## [5] V. Janilionis, V. Morkevicius, and R. Rauleckas. “III dalis. StatistinÄ—s
## analizÄ—s pavyzdžių naudojant pavyzdin\ce skaitmenin\ce duomenų baz\ce
## medžiaga”. In: _StatistinÄ— kiekybinių duomenų analizÄ— su SPSS ir Stata_.
## Kaunas, 2008. Chap. 10. Daugia, p. 393. <URL:
## http://www.lidata.eu/index.php?file=files/mokymai/stat/stat.html{\&}course{\_}file=stat{\_}III{\_}10.html>.

#  / truncated /

## Warning messages:
## 1: Janilionis2008-III-10: unknown macro '\c' 
## 2: Janilionis2008-III-10: unknown macro '\c' 
## 3: Janilionis2008-III-10: unknown macro '\c' 

This encoding issue is related to #53

crsh commented 4 years ago

Thanks for the PR, I've hardcoded the expected encoding of parent documents to UTF-8, because rmarkdown assumes UTF-8 encoding anyways and because the option citr.encoding specifies the encoding of the Bib-file.

crsh commented 4 years ago

Hi @GegznaV, has this issue been resolved (except for the upstream encoding issue)?

GegznaV commented 4 years ago

It seems that only the upstream issue is left.