Describe the bug
Records can have multiple <pub-date publication-format="electronic"> tags. In particular, Springer/Nature can have <pub-date date-type="pub"> and <pub-date date-type="collection"> to indicate the publication dates of the paper itself, and the formal publication date of the collection (e.g. issue) it belongs to. The current parser does find_all("pub-date") so the parser will iterate over all dates found. If it has already found an electronic date, it will be overwritten by subsequent ones.
To Reproduce
See the file /proj/ads/abstracts/data/NATURE/npj/NPJ.052224/JOU=41467/VOL=2024.15/ISU=1/ART=48265/41467_2024_Article_48265_nlm.xml. There are two <pub-date publication-format="electronic"> tags in this file, the first having date-type="pub" (May 2024) and the second, "collection" (December 2024). In this case the article itself was published in May 2024 but the parser records it as December.
Additional context
You can do a find_all, but you need to check the date-type along with publication-format, with "pub" being the preferred date-type over "collection". Collection can go into otherDateType.
Describe the bug Records can have multiple
<pub-date publication-format="electronic">
tags. In particular, Springer/Nature can have<pub-date date-type="pub">
and<pub-date date-type="collection">
to indicate the publication dates of the paper itself, and the formal publication date of the collection (e.g. issue) it belongs to. The current parser doesfind_all("pub-date")
so the parser will iterate over all dates found. If it has already found an electronic date, it will be overwritten by subsequent ones.To Reproduce See the file /proj/ads/abstracts/data/NATURE/npj/NPJ.052224/JOU=41467/VOL=2024.15/ISU=1/ART=48265/41467_2024_Article_48265_nlm.xml. There are two
<pub-date publication-format="electronic">
tags in this file, the first having date-type="pub" (May 2024) and the second, "collection" (December 2024). In this case the article itself was published in May 2024 but the parser records it as December.Additional context You can do a find_all, but you need to check the date-type along with publication-format, with "pub" being the preferred date-type over "collection". Collection can go into otherDateType.