acl-org / acl-anthology

Data and software for building the ACL Anthology.
https://aclanthology.org
Apache License 2.0
415 stars 282 forks source link

Guidance for program/workshop chairs regarding paper metadata? #927

Open nschneid opened 4 years ago

nschneid commented 4 years ago

I am a workshop organizer and find that the START final submission form needs a nontrivial amount of customization.

Given that author name cleanliness has been an issue, I am including

Should the Anthology provide official guidelines as to what should be on the final submission form, in order to improve quality of the metadata to be ingested? Maybe there are other recommendations that would make sense as well.

nschneid commented 4 years ago

Digging around in START a bit more I see there is a tool called "Title Case Formatter for Titles/Authors" which suggests capitalization fixes using heuristics. I suggest we recommend this IN ADDITION to recommending that authors double-check. And care should be taken that changes made in the formatter tool are also reflected in the PDF.

mjpost commented 4 years ago

I don't have this fresh in memory, so screen shots would help. But a note that name formatting is drawn from the global profile could be helpful.

It would be a good service to chairs to have this documented. I would put this in https://github.com/acl-org/acl-pub (perhaps after the consolidation).

As for title case protection, I wonder if we can just handle that on the Anthology side. We got a few people at ACL 2020 who {D}id {S}omething {L}ike {T}his to their titles. Perhaps documentation closer to the actual edit field would stave that off.

nschneid commented 4 years ago

name formatting is drawn from the global profile

Only for registered authors. At SemEval we are finding a lot of the papers have unregistered coauthors.

danielgildea commented 3 years ago

I think it would be nice if START gave a warning to authors if any name is either all upper case or all lower case. There were a huge number of these in ACL 2020. Both Chinese names that were all lower case and Europeans using full caps in the family name. It would be nice to catch these before they even get to the program chair, much less the anthology.

mjpost commented 3 years ago

Agreed. But I wonder if we could just handle this ourselves? That is

The argument being that:

nschneid commented 3 years ago

I worry about the Anthology recasing in ways that are not transparent to the user. So better if we can implement it in START as well. For SemEval we had a checkbox in the final submission form requiring authors to double-check name spelling and capitalization; maybe START could trigger such a confirmation when entering a name in the author field if and only if the name doesn't conform to a regular expression.

(And if we're asking START to add functionality, could we ask for ORCID fields as well?)