acl-org / acl-anthology

Data and software for building the ACL Anthology.
https://aclanthology.org
Apache License 2.0
372 stars 252 forks source link

Publishing in non-Latin author names, alongside with the Latin script version #2576

Open chikiulo opened 1 year ago

chikiulo commented 1 year ago

It would be nice to have an option for authors have their names published in their own language alongside the Latin script versions of their names. This is probably beyond the scope of work of the anthology and may requires coordination with venue publication chairs.

There are academic journals in other research fields providing such option to authors. Including author names using non-Roman alphabets | Astronomy & Astrophysics Non-Latin Author Names | Journal of Neuroscience

Physical Review Journals - Information About Author Names

I am aware of the complexity of including author names in non-Latin scripts in the anthology records and subsequent citations. As such, just adding non-Latin names alongside in the document source LaTeX may not be the best solution. One possible solution may be something like a unique ID, to which we tie some sort of data object that contains information like "name to be rendered on paper", "name to be appeared in citation records", etc.

akoehn commented 1 year ago

This would be possible to do in principle in the anthology, but:

On the anthology side, there is no relation to LaTeX; we store all information in XML files.

mbollmann commented 1 year ago

We already support storing non-Latin name variants in our XML on a per-paper basis; see the 2020 CCL proceedings for an example. They are currently shown in brackets on the author list of the paper, but are not used for the citations.

We also have mechanisms for assigning IDs to authors (for disambiguation purposes) and could, in theory, think about attaching non-Latin name variants directly to an author as well; but the main question then is, like Arne said, where we would obtain that data from.

Also, like Arne said, we do not create or modify LaTeX or PDF files at all, so anything that relates to that is indeed outside the scope of the Anthology.

mbollmann commented 1 year ago

FWIW, this support for name variants happened mainly because the CCL conference originally didn't have Romanized names in their metadata (see #1027 for the entire history and discussion), but I don't see why this feature couldn't be extended further.