NASA-PDS / doi-service

Service and tools for generating DOIs for PDS bundles, collections, and data sets
https://nasa-pds.github.io/doi-service
Other
2 stars 3 forks source link

Develop script to sync SBN DOIs with DOI Service #312

Closed jordanpadams closed 2 years ago

jordanpadams commented 2 years ago

💡 Description

Script should execute via cron and update DOI Service database with SBN DOIs. These DOIs will have different prefixes (10.26033 and 10.26007), but these may eventually be managed by our service, so might as well start syncing them now. In the meantime, if a user tries to update those DOIs from our account, it will fail anyways.

tloubrieu-jpl commented 2 years ago

should be ready for review by the end of this week.

tloubrieu-jpl commented 2 years ago

A lot of SBN records are missing mandatory fields for the DOI local database. So far it sounds like only 100 / 500 can be imported but Scott need to investigate that to confirm.

collinss-jpl commented 2 years ago

Regarding the incompatibility of the current SBN DOI records:

Attached are dumps of the DOI records returned from DataCite for each of the SBN prefixes (the .txt extension can be safely removed): doi.10.26007.json.txt doi.10.26033.json.txt

The 10.26007 prefix contains 363 records, of which only 24 are usable by the service as-is. For 10.26033, there are 135, of which only 1 (!) was importable.

Records which are not importable fall into one of two catagories (or both):

For 10.26007, 90 records are "findable", and for 10.26033 there are 115 "findable" records. All others are in "draft" state and probably don't need immediate correcting.

collinss-jpl commented 2 years ago

The DOI Service installations on pds-gamma and pds-prod1 have been updated to version 2.1.3, which incorporates the changes from Pull Request #315.

On pds-prod1, a shell script has been added at /home/pds4/pds-doi-service/sync_dois.sh for use with the local crontab. The crontab itself has been updated for user pds4 to now include the following:

# DOI Database Syncronization with DataCite
@daily /home/pds4/pds-doi-service/sync_dois.sh --prefix 10.17189 >> /home/pds4/pds-doi-service/cron_logs/sync_dois_10.17189.log 2>&1
@daily /home/pds4/pds-doi-service/sync_dois.sh --prefix 10.26007 >> /home/pds4/pds-doi-service/cron_logs/sync_dois_10.26007.log 2>&1
@daily /home/pds4/pds-doi-service/sync_dois.sh --prefix 10.26033 >> /home/pds4/pds-doi-service/cron_logs/sync_dois_10.26033.log 2>&1

So DOI's for both our prefix and the two SBN prefixes will be pulled down and synchronized with the transaction database on pds-prod1 once per day (the actual time is up to the cron system, but time stamps of execution are recorded in the logs under pds-doi-service/cron_logs).