datacite / corpus-data-file

Code and steps used to generate the Data Citation Corpus dump file
MIT License
2 stars 0 forks source link

Create report of assertions where subj_id does not match either accession_number or doi #27

Open lizkrznarich opened 1 week ago

lizkrznarich commented 1 week ago

Each assertion should have a value in either accession_number (source=czi) or doi (source=datacite). This value represents the identifier for the dataset that is cited.

For each assertion, the value in subj_id should match the value in either accession_number or doi, but prior to the recent data cleanup there were assertions found where subj_id did not match either value.

Re-run queries and generate new report to check for mis-matches.

Previous queries and notes: https://docs.google.com/document/d/1eHRN1cW1zTxLmM-p4XZm8ba8qvAmmx_Ooq-QFeOw43k/edit#heading=h.tra87n9jgjll Previous report: https://docs.google.com/spreadsheets/d/1cFUaUIekfzsr3q8dS2M-ctGiuTT5saI_tMePgaliPJ8/edit?gid=488132539#gid=488132539