acl-org / acl-anthology

Data and software for building the ACL Anthology.
https://aclanthology.org
Apache License 2.0
424 stars 283 forks source link

Malformed month in IWCS 2019 .bib files #360

Open nschneid opened 5 years ago

nschneid commented 5 years ago

E.g. https://aclweb.org/anthology/papers/W/W19/W19-0405.bib has

month = "23{--}27 " # may,
mbollmann commented 5 years ago

That looks generally correct to me, except maybe for the brackets around the dash. may is a BibTeX macro and # is string concatenation.

EDIT: well, except that days shouldn't be in a "month" field to begin with... d'oh.

nschneid commented 5 years ago

I don't think the Zotero BibTeX parser recognizes this syntax (can't speak for other reference managers). Is this used in other Anthology .bib files?

mbollmann commented 5 years ago

Yes it is, see the discussion here: https://github.com/acl-org/acl-anthology/issues/94#issuecomment-451738779

mbollmann commented 5 years ago

FWIW, I tried importing this entry in Zotero and it shows the date as "23--27 05 2019", which seems like it generally handles it.

akoehn commented 5 years ago

I can't see any reference to days in #94, only to multi-month entries. I we wanted to have days in the bibtex entry, it should be in the day field, not the month one.

Edit: editing your comments is a bit like cheating, I didn't see your edit as it is not distributed through email, @mbollmann :-)

nschneid commented 5 years ago

FWIW, I tried importing this entry in Zotero and it shows the date as "23--27 05 2019", which seems like it generally handles it.

For me, Zotero imports it as "23–27 2019", missing the month entirely.

In any case, the date clearly shouldn't be listed under the month field.

mbollmann commented 5 years ago

For me, Zotero imports it as "23–27 2019", missing the month entirely.

I should have mentioned that I'm using Zotero with Better BibTeX, maybe that's the difference?

In any case, the date clearly shouldn't be listed under the month field.

Agreed. Since this is already recorded in the XML this way, I feel this is an issue with data ingestion. This particularly instance should be easy to fix, but maybe we want to make sure to check for this in the future. @mjpost?

mjpost commented 5 years ago

A few questions:

  1. Is day actually a bibtex field?
  2. How would one specify a conference date range, e.g., 30 June to 4 July?
  3. What does Zotero do if you import an entry with a field like month = jan # "--" # feb?

I think we should fix the XML. Maybe

mbollmann commented 5 years ago

day is not a BibTeX field, and I think representing ranges is technically not supported at all in plain BibTeX.

BibLaTeX, on the other hand, supports date ranges with precision up to minutes, if desired, and can also distinguish between date of publication and date of event.

nschneid commented 5 years ago

What does Zotero do if you import an entry with a field like month = jan # "--" # feb?

For me it just imports January and ignores the rest of the line. Which I suppose is not the end of the world, if we really want to support localized month names.

If we're willing to assume English month names, any of the following will work as expected, though I realize date is nonstandard BibTeX:

However, combining date with month or year is problematic. Exporting any of these back to BibTeX produces month = feb.

mjpost commented 5 years ago

I agree we should get rid of dates in this field, but Zotero’s failure to parse jan # “—“ # feb seems like a Zotero bug.