Open mjpost opened 1 year ago
I thought we do not ingest frontmatter anymore?
We ingest it if it's there. Most of ACL had frontmatter. We need to check why we didn't assign DOIs. I am think we did in the past.
You changed the logic floor them in the doi generation script recently -- maybe that has something to do with it? Did we change how frontmatters are represented in the xml and then had the doi generation script in am outdated state until you changed it?
DOI ingestion is two steps:
bin/generate_crossref_doi_metadata.py
produces a big nasty XML file that we upload and use to generate DOIsbin/add_dois.py
goes through each paper in a volume, checks if its DOI works, and if so, adds it to our XMLWhat I changed is (2) which was broken because it assumed there was always a <frontmatter>
block, which there wasn't for EMNLP 2022, because they never delivered it. I didn't change (1). Looking to past frontmatters, we don't in fact generate a DOI for the volume itself. We probably should.
Actually, though, this reminds me that I also change the ingestion script (post-EMNLP) to always generate the <frontmatter>
. If there's no frontmatter PDF, we still need the block, we just don't generate the <url>
tag inside it. We need to add this to EMNLP.
Whatever the reasoning for always generating the <frontmatter>
block was, I still suspect it's the wrong solution to a problem I don't yet understand.
<frontmatter>
is just the special stub for paper 0. If we don't generate it, then no bibtex is generated for the volume itself. We want to generate this volume bibtex even if there is no PDF. (If there is a PDF, we add a <url>
tag within frontmatter, as we do for papers.) This is all a separate issue.
We don't generate DOIs for the complete volume or frontmatter, and haven't for some time. If we want to, we just need to figure it out. I haven't had time to do this. See also #726.
Re-upping this for this quarter—we should generate DOIs for front matter. This involves:
We didn't get DOIs in frontmatter for EMNLP 22 or ACL 23.