SuLab / WikidataIntegrator

A Wikidata Python module integrating the MediaWiki API and the Wikidata SPARQL endpoint
MIT License
243 stars 46 forks source link

Wrong date parsed in crossref API #191

Closed Adafede closed 2 years ago

Adafede commented 2 years ago

Hi,

Thanks to @Daniel-Mietchen, we noticed we had an error in our bot, which was taking the wrong date from crossref API.

My first reflex was to look how you are doing it and it looks like we are doing it the same way...

https://github.com/SuLab/WikidataIntegrator/blob/f2c92d2d3ecd7ee3cba03d81e50b008af4ea0a13/wikidataintegrator/wdi_helpers/publication.py#L362

This is the date the entry was created in CrossRef and not the date of publication, see http://api.crossref.org/works/10.1016%2Fs0031-9422%2800%2994305-x for an example.

This might imply some heavy curation of the article dates on WD...

Also tagging @bjonnh in case!

Happy to help!

Adafede commented 2 years ago

No new?

Daniel-Mietchen commented 2 years ago

Thanks, @andrawaag !

andrawaag commented 2 years ago

The incorrect date was indeed parsed and used. The issue was that the crossref API use a data model for timestamps that includes standardized timestamps for all included dates, except for the publication date. There the timestamp is only presented as a list within a list where the dates might have different forms. I assumed the date in that list follows the "year-month-day order", and that the list within the list does actually only have one date.

Format of the publication date

image

Format of other dates

image
Adafede commented 2 years ago

Thank you very much indeed!

Adafede commented 2 years ago

@Daniel-Mietchen Do you know if there will be a bot taking care of the correction of the already created dates?

Daniel-Mietchen commented 1 year ago

@Daniel-Mietchen Do you know if there will be a bot taking care of the correction of the already created dates?

@Adafede I'm not aware of anyone running such a bot job. Given the scale of the edits, it should also probably be done by a dedicated account. Will think about it.

carlinmack commented 1 year ago

Hi, I'm currently looking at fixing this issue on Wikidata. I first want to elaborate on the conversation so far.

The example given in the OP of this issue is:

Listed are all the dates associated with this paper:

Originally the publication date used created and then in https://github.com/SuLab/WikidataIntegrator/commit/c9ac5f43f7e09ac42ae33a9447b503bd02fb71cd this behaviour was changed to use 'issued' instead.

Now lets look at this example:

Listed are all the dates associated with this paper:

Wikidata has the correct date for this paper (2002, source), however using the now preferred issued property we would say this was published in 2009.

issued does seem to be correct most of the time, but would be great to figure this out. I have more mismatches to go through and will update with other examples

Adafede commented 1 year ago

Hi @carlinmack...sorry for not replying earlier, could you find out more?

carlinmack commented 12 months ago

I haven't looked thoroughly through the mismatches but I haven't find any other similar cases since. I found some documentation in the API for these dates:

created - sort by created date deposited - sort by time of most recent deposit indexed - sort by time of most recent index is-referenced-by-count - sort by number of times this DOI is referenced by other Crossref DOIs issued - sort by issued date (earliest known publication date) published - sort by publication date published-online - sort by online publication date published-print - sort by print publication date references-count - sort by number of references included in the references section of the document identified by this DOI relevance - sort by relevance score score - sort by relevance score updated - sort by date of most recent change to metadata, currently the same as deposited

So I think issued is most correct date and I should most probably just report the issue with 10.1110/ps.4690102

Adafede commented 6 months ago

@Daniel-Mietchen Do you know if there will be a bot taking care of the correction of the already created dates?

@Adafede I'm not aware of anyone running such a bot job. Given the scale of the edits, it should also probably be done by a dedicated account. Will think about it.

I just started a batch of 48k corrections: https://quickstatements.toolforge.org/#/batch/225537 (and 20 next ones)