UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
269 stars 245 forks source link

bib entry for UD 2.7: overleaf reports unicode error #793

Closed jowagner closed 1 year ago

jowagner commented 3 years ago

There appears to be a unicode combination character U+0300 between o}̀ and and Omura in the bib entry provided on https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-3424

$ echo "Ad{\'e}day{\d o}̀ and Omura" | hexdump -C
00000000  41 64 7b 5c 27 65 7d 64  61 79 7b 5c 64 20 6f 7d  |Ad{\'e}day{\d o}|
00000010  cc 80 20 61 6e 64 20 4f  6d 75 72 61 0a           |.. and Omura.|
0000001d
>>> 'U+%04x' %ord(b'\xcc\x80'.decode('UTF-8'))
'U+0300'

11234-1-3424-bib-error

It displays as

o}`and Omura

in the overleaf editor (and when displaying the download page in Firefox, see screenshot above) and overleaf fails compiling with a unicode error.

dan-zeman commented 3 years ago

This is a Lindat/Clariah issue, which probably pertains to their clarin-dspace repository, as previous BibTeX issues have been discussed there in https://github.com/ufal/clarin-dspace/issues/515 (@kosarko).

The last letter in the name Adedayo (Adédayọ̀) should be the latin small letter o with a dot below and a grave accent above. But Unicode cannot accommodate this with a single character, the Unicode support for Yoruba is suboptimal. So it has to be represented with the help of a combining diacritical mark:

I suppose that the appropriate TeX macro would be {\`{\d o}} or {\d{\`o}}. But the Lindat BibTeX generator would need additional support for the Unicode combining diacritics.

jowagner commented 3 years ago

OK, where can we submit a feature request for the Lindat BibTeX generator? Is https://github.com/ufal/clarin-dspace/issues the right place?

I can confirm that

Ad{\'e}day{\d{\`o}}

renders correctly in overleaf (with emnlp 2021 style files; I also use

\DeclareUnicodeCharacter{01B0}{\`u}
\DeclareUnicodeCharacter{01A1}{\`o}

copied from an older paper but haven't tested whether this is still needed).

dan-zeman commented 3 years ago

OK, where can we submit a feature request for the Lindat BibTeX generator? Is https://github.com/ufal/clarin-dspace/issues the right place?

I think it is.