NASA-PDS / doi-service

Service and tools for generating DOIs for PDS bundles, collections, and data sets
https://nasa-pds.github.io/doi-service
Other
2 stars 3 forks source link

Corruption in local database with invalid JSON #318

Closed jordanpadams closed 2 years ago

jordanpadams commented 2 years ago

πŸ› Describe the bug

When trying to list all DOIs in the database, we are getting an error

πŸ“œ To Reproduce

pds-doi-cmd list -start 1990-01-01 -end 2022-02-15 -f label > DataCite_dump_results_20220215.json
...
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 82 column 320 (char 3528)

πŸ•΅οΈ Expected behavior

Able to execute the pds-doi-cmd command above successfully.

βš™οΈ Engineering Details

per @collinss-jpl

Somehow, an invalid JSON label found its way into the transaction database, and unfortunately it’s not clear exactly which label it is. Some debug statements need to be added to print the DOI/PDS ID prior to parsing of each label by the list action. Or conversely, the current database/transaction history on prod1 could be deleted and resynced from DataCite.

I think our easiest solution to get this fixed is the latter:

the current database/transaction history on prod1 could be deleted and resynced from DataCite.

tloubrieu-jpl commented 2 years ago

@jordanpadams @rsjoyner the corrupted label prefix is 10.26033.

Who is authoring these labels ?

I will go on looking at which exact label triggers the error and I'll let you know.

tloubrieu-jpl commented 2 years ago

It is from small bodies node, the DOI is 10.26033/r34k-2238.

The label is: { "data": { "id": "10.26033/r34k-2238", "type": "dois", "attributes": { "doi": "10.26033/r34k-2238", "suffix": "r34k-2238", "identifiers": [ { "identifier": "urn:nasa:pds:gbo.ast-dtype.gartrelleetal.irtf.spectra::1.0", "identifierType": "PDS4 Bundle LIDVID" } ], "creators": [ { "nameType": "Personal", "name": "Gordon M. Gartrelle", "nameIdentifiers": [ { "schemeUri": "https://orcid.org", "nameIdentifier": "https://orcid.org/0000-0003-1422-9271", "nameIdentifierScheme": "ORCID" } ] }, { "nameType": "Personal", "name": "Paul S. Hardersen", "nameIdentifiers": [ { "schemeUri": "https://orcid.org", "nameIdentifier": "https://orcid.org/0000-0002-0440-9095", "nameIdentifierScheme": "ORCID" } ] }, { "nameType": "Personal", "name": "Matthew R. M. Izawa", "nameIdentifiers": [ { "schemeUri": "https://orcid.org", "nameIdentifier": "https://orcid.org/0000-0001-5456-2912", "nameIdentifierScheme": "ORCID" } ] }, { "nameType": "Personal", "name": "Matthew C. Nowinski", "nameIdentifiers": [ ] } ], "titles": [ { "title": "Gartrelle et al. IRTF Asteroid Spectra Bundle V1.0", "lang": "en" } ], "publisher": "NASA Planetary Data System", "publicationYear": "2021", "subjects": [ { "subject": "Asteroids" } ], "contributors": [ { "nameType": "Organizational", "name": "Planetary Data System: PDS Small Bodies Node", "contributorType": "DataCurator" } ], "types": { "resourceTypeGeneral": "Dataset", "resourceType": "PDS4 Refereed Data Bundle " }, "relatedIdentifiers": [ ], "descriptions": [ { "description": "This data set is comprised of the VNIR (0.69-2.5 micron) spectra of twenty-five D-type asteroids from varying Solar System locations. The spectra were obtained from NASA/IRTF on Mauna Kea between 2016-2019 using the SpeX instrument in Low-Resolution Prism mode and the 0.8 x 15" slit. Guiding was performed using spillover light from the slit (GuideDog) for targets with apparent magnitude >15.5. For targets fainter than magnitude 15.5, guiding was accomplished using the MORIS CCD imager.", "descriptionType": "Abstract", "lang": "en" } ], "url": "https://sbn.psi.edu/pds/resource/girtfspec.html", "created": "2021-04-01T22:06:37.000000Z", "updated": "2022-02-11T19:16:34.000000Z", "state": "findable", "language": "en", "schemaVersion": "http://datacite.org/schema/kernel-4" } } }

Something is not right with the descriptions field. it contains a ". I believe we or SBN should replace the text '0.8 x 15" slit' with something without double quote. 2 single quotes would work.

tloubrieu-jpl commented 2 years ago

email sent to @jordanpadams and @rsjoyner today. We need to contact someone at SBN to do the update.

jordanpadams commented 2 years ago

forwarded to SBN for updates

tloubrieu-jpl commented 2 years ago

Update has been done by SBN, @tloubrieu-jpl need to synchronize data again and retest the list command

tloubrieu-jpl commented 2 years ago

Synchronization is done and the list command works again on pds-prod1