acl-org / acl-anthology

Data and software for building the ACL Anthology.
https://aclanthology.org
Apache License 2.0
406 stars 280 forks source link

Eval4NLP 2023 Ingestion #3081

Closed mjpost closed 6 months ago

mjpost commented 7 months ago

Don't forget to link it to WS, and reference it from AACL and IJCNLP.

Originally posted by @flipz357 in https://github.com/acl-org/acl-anthology/issues/2791#issuecomment-1937847436

flipz357 commented 7 months ago

The relevant material had been linked by @kfnks in this comment. To quote: https://www.dropbox.com/scl/fi/wu74zqzgdp34p0ifreznd/papers.yml?rlkey=uqtttxtbnuzrl4tl1jlwfqjs1&dl=0

anthology-assist commented 7 months ago

@kfnks @flipz357 The ingestion material for eval4nlp is incomplete. I saw that it's in aclpub2 format -- We need all .yaml files and everything in the expected output section specified here. Let me know if anything is unclear.

flipz357 commented 7 months ago

(ping @danieldeutsch)

danieldeutsch commented 7 months ago

Can you be more specific about what is missing? I zipped the output from the aclpub2 tool. This is what I delivered to the publication chairs: https://drive.google.com/file/d/1AOKfrBeEIL1sViZWuAfbsfUw1actbFpg/view?pli=1

flipz357 commented 7 months ago

@anthology-assist I also see all the required .yml files in the zip provided by @danieldeutsch. It's now also not clear to me what's exactly missing.

flipz357 commented 6 months ago

@anthology-assist Could you comment on what is still missing?

mjpost commented 6 months ago

I'm not sure what's missing and hope @anthology-assist will respond soon. Can you run this script which might turn up something?

flipz357 commented 6 months ago

Thanks @mjpost, maybe @anthology-assist is busy or away at the moment. @danieldeutsch could you try running the script?

flipz357 commented 6 months ago

@anthology-assist @mjpost I now tried running the script. It seems that everything is fine:

✓ Found anth/inputs/conference_details.yml
✓ Found anth/inputs/papers.yml
✓ Found PDF file anth/watermarked_pdfs/1.pdf
✓ Found PDF file anth/watermarked_pdfs/2.pdf
✓ Found PDF file anth/watermarked_pdfs/3.pdf
✓ Found PDF file anth/watermarked_pdfs/4.pdf
✓ Found PDF file anth/watermarked_pdfs/5.pdf
✓ Found PDF file anth/watermarked_pdfs/6.pdf
✓ Found PDF file anth/watermarked_pdfs/7.pdf
✓ Found PDF file anth/watermarked_pdfs/8.pdf
✓ Found PDF file anth/watermarked_pdfs/9.pdf
✓ Found PDF file anth/watermarked_pdfs/10.pdf
✓ Found PDF file anth/watermarked_pdfs/11.pdf
✓ Found PDF file anth/watermarked_pdfs/12.pdf
✓ Found PDF file anth/watermarked_pdfs/13.pdf
✓ Found PDF file anth/watermarked_pdfs/14.pdf
✓ Found PDF file anth/watermarked_pdfs/15.pdf
✓ Found PDF file anth/watermarked_pdfs/16.pdf
✓ Found PDF file anth/watermarked_pdfs/17.pdf
✓ Found PDF file anth/watermarked_pdfs/18.pdf
✓ Found PDF file anth/watermarked_pdfs/19.pdf
✓ Found frontmatter at anth/watermarked_pdfs/0.pdf
danieldeutsch commented 6 months ago

Thanks Juri for running the command! The last issue that I was made aware of was that papers.yml was incorrectly formatted. In the original delivery of the proceedings, I just used the output of the aclpub2 tool. I was then told it was not correctly formatted. The generated file did not conform to the expected output described in the aclpub2 Readme, so I manually edited the file to match the expected format as best as I could.

If any yml file is in an unexpected format, there were only 19 papers so I think just manually correcting the file makes the most sense to me to get this resolved.

mjpost commented 6 months ago

The validation script doesn't (yet) check for internal formatting, so it's possible there is still something wrong. @anthology-assist can handle this shortly.

mjpost commented 6 months ago

There is a preview here; can you please take a look? Note that we only generate preview bibtex for the first three items.

flipz357 commented 6 months ago

Thanks, it all looks good, except for this weird author name here image

I think it could be a bug from conference_details.yml. @danieldeutsch do you know what is this "John Walker" :-)

How important is it to correct, or can it be also corrected later?

flipz357 commented 6 months ago

I've corrected the names in the file now manually, can you replace the other one?

conference_details.yml

anthology_venue_id: Eval4NLP
start_date: 2023-11-01
end_date: 2023-11-01
isbn: 979-8-89176-021-9
location: Bali, Indonesia
editors:
  - first_name: Daniel
    last_name: Deutsch
  - first_name: Rotem
    last_name: Dror
  - first_name: Steffen
    last_name: Eger
  - first_name: Yang
    last_name: Gao
  - first_name: Christoph
    last_name: Leiter
  - first_name: Juri
    last_name: Opitz
  - first_name: Andreas
    last_name: Rücklé
publisher: Association for Computational Linguistics
watermark_book_title: Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems

Except from this everything else LGTM

mjpost commented 6 months ago

It will be easier just to update the XML directly. If you guys want to issue the PR, I can approve it, that might make things go faster. Otherwise @anthology-assist will add it to her queue.

flipz357 commented 6 months ago

I made a PR. Hope I did everything correctly.