Closed rsjoyner closed 1 year ago
Based on stacktrace and confirmation that doi db in question has data, suspect bad data.
Will need access to prod box or a copy of prod's path/to/pds-doi-service/transaction_history
to perform further troubleshooting, which is likely worthwhile both for user and for project as bad labels should (presumably) fail during reserve/release.
@rsjoyner when you have a chance can you share the SQLite .db file that was used when you saw the error ?
Thanks
@tloubrieu-jpl I'll LFT you a copy of the sqlite db which Ron provided, but my guess is that we'll need the transactions' json to get any further on this
Unable to reproduce on pds@pdscloud-prod1
using the instance at /home/pds4/pds-doi-service
.
Awaiting confirmation from @rsjoyner wrt whether
pds-doi-service
The offending DOI has id 10.26033/3k3c-5713
and refers to product urn:nasa:pds:satellite-phoebe.cassini.shape-models-maps::1.0
.
The label file is located at pds-prod1:/home/pds4/pds-doi-service/transaction_history/unk/10.26033/3k3c-5713/2022-12-14T20:06:48+00:00/output.json
and contains the following malformed json attribute (N.B. non-escaped double-quotes)
"descriptions": [
{
"description": "This bundle contains a shape model for the Saturnian moon Phoebe, along with quality assessment data. The global model is similar to the previously archived "Gaskell Phoebe Shape Model", but is provided in multiple formats.",
"descriptionType": "Abstract",
"lang": "en"
}
]
@jordanpadams this should unblock @rsjoyner for the time being, but I should loop back to this later and ensure that we aren't parsing values from XML labels without escaping them for JSON write. If that's not the case, and the problem is in the source label data (unlikely), then we'll need to make a decision about how doi-service should handle that.
@rsjoyner are you able to provide a copy of the input label for urn:nasa:pds:satellite-phoebe.cassini.shape-models-maps::1.0
?
Based on the DOI: "doi": "10.26033/ehkj-xj95". The base DOI for the DOI service is: "10.17189 AND that the Node is SBN, AND that I can't locate the original XML file, I suspect that this DOI was NOT minted by the EN DOI service ?
Was there a bulk "merge" of SBN DOIs on: "updated": "2022-02-08T18:07:35.000000Z"? OR, am I just confused once again.
Note that the DOI value and the description in the errant DOI (above) do not match the metadata in my "dump results".
The urn also is different from the errant: "identifier": "urn:nasa:pds:gaskell.phoebe.shape-model::1.0"
This is very strange to me. Here is the DOI metadata that has: "title": "Gaskell Phoebe Shape Model Bundle V1.0". This is the only record having "Gaskell".
{
"id": "10.26033/ehkj-xj95",
"type": "dois",
"attributes": {
"doi": "10.26033/ehkj-xj95",
"suffix": "ehkj-xj95",
"identifiers": [
{
"identifier": "urn:nasa:pds:gaskell.phoebe.shape-model::1.0",
"identifierType": "PDS4 Bundle LIDVID"
}
],
"creators": [
{
"nameType": "Personal",
"name": "Robert W. Gaskell",
"nameIdentifiers": [
{
"schemeUri": "https://orcid.org",
"nameIdentifier": "https://orcid.org/0000-0002-2293-7879",
"nameIdentifierScheme": "ORCID"
}
]
}
],
"titles": [
{
"title": "Gaskell Phoebe Shape Model Bundle V1.0",
"lang": "en"
}
],
"publisher": "NASA Planetary Data System",
"publicationYear": "2020",
"subjects": [
{ "subject": "Saturnian satellites" }
],
"contributors": [
{
"nameType": "Organizational",
"name": "Planetary Data System: PDS Small Bodies Node",
"contributorType": "DataCurator"
}
],
"types": {
"resourceTypeGeneral": "Dataset",
"resourceType": "PDS4 Refereed Data Bundle"
},
"relatedIdentifiers": [
],
"descriptions": [
{
"description": "The shape model of Phoebe derived by Robert Gaskell from Cassini images. The model is provided in the implicitly connected quadrilateral (ICQ) format. This version of the model was prepared on August 4, 2012. Vertex-facet versions of the models are also provided.",
"descriptionType": "Abstract",
"lang": "en"
}
],
"url": "https://sbn.psi.edu/pds/resource/phoebeshape.html",
"created": "2021-05-21T21:46:41.000000Z",
"updated": "2022-02-08T18:12:24.000000Z",
"state": "findable",
"language": "en",
"schemaVersion": "http://datacite.org/schema/kernel-4"
}
},
will create new ticket to better handle escaping of quotes for input data
š Describe the bug
When attempting to generate a report of all DOIs between dates, no transactions listed in output file:
Generates file of '0' length.
š To Reproduce
Steps to reproduce the behavior:
Traceback (most recent call last): File "/home/pds4/pds-doi-service/bin/pds-doi-cmd", line 8, in
sys.exit(main())
File "/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/cmd/pds_doi_cmd.py", line 42, in main
output = action.run(**kwargs)
File "/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doiservice/core/actions/list.py", line 340, in run
dois, = self._web_parser.parse_dois_from_label(label_contents)
File "/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/outputs/datacite/datacite_web_parser.py", line 354, in parse_dois_from_label
datacite_records = json.loads(label_text)["data"]
File "/usr/local/python-3.9.5/lib/python3.9/json/init.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/local/python-3.9.5/lib/python3.9/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/python-3.9.5/lib/python3.9/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 103 column 199 (char 4429)
šµļø Expected behavior
The expected error is that a valid set of DOI metadata is written to the output file.
š Version of Software Used
pds-doi-service==2.1.3