acl-org / acl-anthology

Data and software for building the ACL Anthology.
https://aclanthology.org
Apache License 2.0
409 stars 281 forks source link

DOI Database Missing Information or Incompatible with NSF Web Site #1432

Open neubig opened 3 years ago

neubig commented 3 years ago

Issue description

It seems that perhaps the DOI database for the anthology is missing venue information, or it is not included in a way that makes it ingestible by the NSF web site. As a result, when entering a DOI from the anthology into the NSF web site (specifically, I tried EMNLP 2020, but I think others have similar issues), you get the following error, and the "venue" field is not filled in. Please see attached screenshot.

Steps to reproduce the issue

  1. Enter a DOI from the anthology (e.g. EMNLP 2020) into the NSF web site

What's the expected result?

All requisite information is filled in.

What's the actual result?

Not all requisite information is filled in, and an error is displayed.

Additional details / screenshot

mjpost commented 3 years ago

Thanks for reporting this!

We don't have a DOI database—just DOI URLs redirecting to canonical pages (importantly, not the PDFs). I don't think DOIs are expected to direct to a specific page type, so either I'm wrong, or maybe NSF is relying on a set of headers or HTML tags in order to parse information returned from the fetched DOI. Do you have any idea what it is?

If this is easy to implement it could get done quickly, but I am not sure I'll have time to do the research on this in the near future.

neubig commented 3 years ago

After looking around for this for a while I have absolutely no idea, and I don't care enough to look any further so I'll close this issue. Thanks for the response anyway!

mjpost commented 3 years ago

Fair enough. I'll keep it open, though. It seems likely there's a simple API we could implement here that NSF is expecting. We like to keep nimble, and these small conveniences add up.

nschneid commented 2 years ago

I am encountering this issue and the main problem is that the conference name field is not filled in. (Abstract etc. also blank but optional.)

mjpost commented 2 years ago

The XML we submit to DOI contains a lot of information other than the redirect URL. It occurs to me that this is what is probably exposed in the NSF <> DOI API. Here is the file we submitted for EMNLP 2020 main conference papers, per @neubig's mention. Can you see anything obviously wrong or missing?

nschneid commented 2 years ago

I'm no expert on DOIs but I notice that the <event_metadata> containing the conference name appears only under <conference> and not under each <conference_paper>, which may be what NSF is expecting.

The TACL DOI import worked correctly, so you could compare MIT Press's XML?

mjpost commented 2 years ago

It would be annoying if metadata had to be redundantly listed for every paper. It doesn’t seem necessary from DOI’s own examples. But checking with TACL is a good idea.

Can you provide an example of a specific Anthology ID that works, and one that fails?

nschneid commented 2 years ago
mjpost commented 2 years ago

I added TACL.