Open cpmpercussion opened 1 week ago
it seems that the harmonise function is doing what we are asking to
The current version of the harmoniser function, uses the BibTexParser
at line 36 with customization=homogenize_latex_encoding
. So the behavior -- with respect to characters encoding -- is correct, while it's weird what happens to the title
and url
. Apparently BibTexParser
has only built in customization as homogenize_latex_encoding
or convert_to_unicode
. If we use the latter, the strange behaviors disappear, and there are no apparent changes in the .bib
file as the text is already unicode.
So we either need to develop a 'custom' customization (possible?), or perhaps see if migrating from BibTexParser 1.4 --> 2.0 is a viable option to get the UTF-8 code.
As observed by @stefanofasciani:
This is incorrect behaviour:
This is because the
.bib
file is in bibtex format but used to create other text representations of the papers (e.g., NIME individual paper webpages and Zenodo entries). So we need the text in the bibtex fields to be a "plain" UTF-8 representation of the text that could go into an HTML document or an API call, not something tuned to show up correctly in a LaTeX document.The todo here is:
Ultimately we may want to move away from .bib files as a storage system, but they have an advantage of ubiquity within academic publishing and if the processes here break down at some point, the .bib files could easily be used in a different ad hoc system by other future maintainers.