cboettig / knitcitations

:package: Generate citations for knitr markdown and html files
http://carlboettiger.info
Other
220 stars 28 forks source link

Encoding issues with citations (Windows only) #103

Closed lcolladotor closed 7 years ago

lcolladotor commented 7 years ago

Hi,

I've been using knitcitations for a while to handle citations in HTML vignettes. I had been using knitcitations::read.bibtex() until I realized that it no longer reads the entries in the order that were given in the bib file**. So I made a change and it all works... except on Windows. I finally updated my R installation in a Windows laptop and saw that the problem is with encoding.

This short code reproduces the issue:

## Load package
library('knitcitations')

## Tries to cite, prints package name and error when it fails
check_bib <- function() {
xx <- sapply(bib, function(x) {
tryCatch(citep(x), error = function(e) {
message(paste('found an error attempting to cite', names(x)))
print(e)
})
})
}

## list of citations
bib <- c(knitcitations = citation('knitcitations'),
    IRanges = citation('IRanges'),
    S4Vectors = citation('S4Vectors'))
check_bib()
citep(bib[['S4Vectors']])

## Error message:
Error in nchar(aut) : invalid multibyte string, element 1

## Entry that fails
> bib[['S4Vectors']]
Pag<U+653C><U+3E38>s H, Lawrence M and Aboyoun P (2017). _S4Vectors: S4 implementation of vector-like and list-like objects_. R package version 0.15.10.

I see that knitcitations::write.bibtex() uses a "?" in authors in situations like this which is why I didn't notice this issue before. From https://cran.r-project.org/doc/manuals/R-exts.html#The-DESCRIPTION-file I see that 'Encoding' in the DESCRIPTION file is used for the citation and I do see "Encoding: UTF-8" in the S4Vectors DESCRIPTION file.

I get this error with GenomeInfoDb, AnnotationDbi, S4Vectors and SummarizedExperiment (details and reproducibility info at https://gist.github.com/anonymous/a8c6374b381dc9c27f55487756cb4e1b) across the different vignettes I maintain. But I don't get it with IRanges, GenomicRanges and other packages where Hervé Pagès is an author (those packages cite the 2013 PLoS paper). For example, the IRanges package has a inst/CITATION file that uses citEntry( , textVersion = "Pag\\es"). So, specifying an inst/CITATION file works.

> citep(bib[['IRanges']])
[1] "(Lawrence, Huber, Pagès, et al., 2013)"

I imagine that there is a way to deal with the encoding problem properly but I haven't been able to find it. If you have ideas on how I can fix this please let me know.

Thanks! Leo

** As you can see below read.bibtex() changes the order of the citations, so I can't cite them later using citep().

> write.bibtex(bib, file = 'test.bib')
Writing 3 Bibtex entries ... OK
Results written to file 'test.bib'
## test.bib contents
@Manual{boettiger2017knitcitations,
  title = {knitcitations: Citations for 'Knitr' Markdown Files},
  author = {Carl Boettiger},
  year = {2017},
  note = {R package version 1.0.8},
  url = {https://CRAN.R-project.org/package=knitcitations},
}

@Article{lawrence2013software,
  title = {Software for Computing and Annotating Genomic Ranges},
  author = {Michael Lawrence and Wolfgang Huber and Herv\'e Pag\`es and Patrick Aboyoun and Marc Carlson and Robert Gentleman and Martin Morgan and Vincent Carey},
  year = {2013},
  journal = {{PLoS} Computational Biology},
  volume = {9},
  issue = {8},
  doi = {10.1371/journal.pcbi.1003118},
  url = {http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118},
}

@Manual{pags2017s4vectors,
  title = {S4Vectors: S4 implementation of vector-like and list-like objects},
  author = {?},
  year = {2017},
  note = {R package version 0.15.10},
}
## read.bibtex() changes the order

> read.bibtex('test.bib')
[1] ? _S4Vectors: S4 implementation of vector-like and list-like objects_. R package version 0.15.10. 2017.

[2] C. Boettiger. _knitcitations: Citations for 'Knitr' Markdown Files_. R package version 1.0.8. 2017. <URL:
https://CRAN.R-project.org/package=knitcitations>.

[3] M. Lawrence, W. Huber, H. Pagès, et al. “Software for Computing and Annotating Genomic Ranges”. In: _PLoS Computational Biology_ 9 (8 2013). DOI:
10.1371/journal.pcbi.1003118. <URL: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118}.>

Extra info for GitHub issue

lcolladotor commented 7 years ago

I also posted the above message at bioc-devel mailing list https://stat.ethz.ch/pipermail/bioc-devel/2017-October/011800.html cc'ing Hervé Pagès.

cboettig commented 7 years ago

Yeah, encoding issues are tough. Looks like this is coming from bibtex::read.bib()? Maybe file an issue there? Or take a look at the RefMangeR package which may do a better job with all this.

lcolladotor commented 7 years ago

Hi,

Explicitly adding the citation using RefManageR::BibEntry() worked just like in https://github.com/leekgroup/derfinderHelper/commit/b63f8c4119686630a5b3cf71c36b16e3e719cf89.

Thanks, Leo

Citations I used

S4Vectors = RefManageR::BibEntry(bibtype = 'manual', key = 'S4Vectors',
    author = 'Hervé Pagès and Michael Lawrence and Patrick Aboyoun',
    title = "S4Vectors: S4 implementation of vector-like and list-like objects",
    year = 2017, doi = '10.18129/B9.bioc.S4Vectors')

GenomeInfoDb = RefManageR::BibEntry(bibtype = 'manual',
    key = 'GenomeInfoDb',
    author = 'Sonali Arora and Martin Morgan and Marc Carlson and H. Pagès',
    title = "GenomeInfoDb: Utilities for manipulating chromosome and other 'seqname' identifiers",
    year = 2017, doi = '10.18129/B9.bioc.GenomeInfoDb')

AnnotationDbi = RefManageR::BibEntry(bibtype = 'manual',
    key = 'AnnotationDbi',
    author = 'Hervé Pagès and Marc Carlson and Seth Falcon and Nianhua Li',
    title = 'AnnotationDbi: Annotation Database Interface',
    year = 2017, doi = '10.18129/B9.bioc.AnnotationDbi')

SummarizedExperiment = RefManageR::BibEntry(bibtype = 'manual',
    key = 'SummarizedExperiment',
    author = 'Martin Morgan and Valerie Obenchain and Jim Hester and Hervé Pagès',
    title = 'SummarizedExperiment: SummarizedExperiment container',
    year = 2017, doi = '10.18129/B9.bioc.SummarizedExperiment')