Closed Adafede closed 2 years ago
No new?
Thanks, @andrawaag !
The incorrect date was indeed parsed and used. The issue was that the crossref API use a data model for timestamps that includes standardized timestamps for all included dates, except for the publication date. There the timestamp is only presented as a list within a list where the dates might have different forms. I assumed the date in that list follows the "year-month-day order", and that the list within the list does actually only have one date.
Format of the publication date
Format of other dates
Thank you very much indeed!
@Daniel-Mietchen Do you know if there will be a bot taking care of the correction of the already created dates?
@Daniel-Mietchen Do you know if there will be a bot taking care of the correction of the already created dates?
@Adafede I'm not aware of anyone running such a bot job. Given the scale of the edits, it should also probably be done by a dedicated account. Will think about it.
Hi, I'm currently looking at fixing this issue on Wikidata. I first want to elaborate on the conversation so far.
The example given in the OP of this issue is:
Listed are all the dates associated with this paper:
Originally the publication date used created
and then in https://github.com/SuLab/WikidataIntegrator/commit/c9ac5f43f7e09ac42ae33a9447b503bd02fb71cd this behaviour was changed to use 'issued' instead.
Now lets look at this example:
Listed are all the dates associated with this paper:
Wikidata has the correct date for this paper (2002, source), however using the now preferred issued
property we would say this was published in 2009.
issued
does seem to be correct most of the time, but would be great to figure this out. I have more mismatches to go through and will update with other examples
Hi @carlinmack...sorry for not replying earlier, could you find out more?
I haven't looked thoroughly through the mismatches but I haven't find any other similar cases since. I found some documentation in the API for these dates:
created - sort by created date deposited - sort by time of most recent deposit indexed - sort by time of most recent index is-referenced-by-count - sort by number of times this DOI is referenced by other Crossref DOIs issued - sort by issued date (earliest known publication date) published - sort by publication date published-online - sort by online publication date published-print - sort by print publication date references-count - sort by number of references included in the references section of the document identified by this DOI relevance - sort by relevance score score - sort by relevance score updated - sort by date of most recent change to metadata, currently the same as deposited
So I think issued
is most correct date and I should most probably just report the issue with 10.1110/ps.4690102
@Daniel-Mietchen Do you know if there will be a bot taking care of the correction of the already created dates?
@Adafede I'm not aware of anyone running such a bot job. Given the scale of the edits, it should also probably be done by a dedicated account. Will think about it.
I just started a batch of 48k corrections: https://quickstatements.toolforge.org/#/batch/225537 (and 20 next ones)
Hi,
Thanks to @Daniel-Mietchen, we noticed we had an error in our bot, which was taking the wrong date from crossref API.
My first reflex was to look how you are doing it and it looks like we are doing it the same way...
https://github.com/SuLab/WikidataIntegrator/blob/f2c92d2d3ecd7ee3cba03d81e50b008af4ea0a13/wikidataintegrator/wdi_helpers/publication.py#L362
This is the date the entry was created in CrossRef and not the date of publication, see http://api.crossref.org/works/10.1016%2Fs0031-9422%2800%2994305-x for an example.
This might imply some heavy curation of the article dates on WD...
Also tagging @bjonnh in case!
Happy to help!