NASA-PDS / doi-service

Service and tools for generating DOIs for PDS bundles, collections, and data sets
https://nasa-pds.github.io/doi-service
Other
2 stars 3 forks source link

Still -- Unable to generate / export json report of DOI metadata #398

Closed rsjoyner closed 1 year ago

rsjoyner commented 1 year ago

๐Ÿ› Describe the bug

When attempting to generate a report of all DOIs between dates, no transactions listed in output file:

pds-doi-cmd list -start 1990-01-01 -end 2022-12-27 -f label > DataCite_dump_results_20221227.json

Generates a wobbly and file of '0' length.

๐Ÿ“œ To Reproduce

Steps to reproduce the behavior: (1) ssh rsjoyner@pds-prod1 (2) ssh pds4@pds-prod1 (3) source /home/pds4/pds-doi-service/bin/activate (4) pds-doi-cmd list -start 1990-01-01 -end 2022-12-27 -f label > DataCite_dump_results_20221227.json

Service generates ERROR:

Traceback (most recent call last): File "/home/pds4/pds-doi-service/bin/pds-doi-cmd", line 8, in sys.exit(main()) File "/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/cmd/pds_doi_cmd.py", line 42, in main output = action.run(**kwargs) File "/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doiservice/core/actions/list.py", line 340, in run dois, = self._web_parser.parse_dois_from_label(label_contents) File "/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/outputs/datacite/datacite_web_parser.py", line 354, in parse_dois_from_label datacite_records = json.loads(label_text)["data"] File "/usr/local/python-3.9.5/lib/python3.9/json/init.py", line 346, in loads return _default_decoder.decode(s) File "/usr/local/python-3.9.5/lib/python3.9/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/local/python-3.9.5/lib/python3.9/json/decoder.py", line 353, in raw_decode obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Expecting ',' delimiter: line 103 column 199 (char 4429) (edited)

๐Ÿ•ต๏ธ Expected behavior

The expected error is that a valid set of DOI metadata is written to the output file.

๐Ÿ“š Version of Software Used

pds-doi-service==2.1.3

๐Ÿฉบ Test Data / Additional context

๐ŸžScreenshots

๐Ÿ–ฅ System Info


๐Ÿฆ„ Related requirements

โš™๏ธ Engineering Details

alexdunnjpl commented 1 year ago

The two problematic files on pds-prod1 have been manually fixed

.../transaction_history/unk/10.26033/r34k-2238/2022-02-11T19:16:34+00:00/output.json
.../transaction_history/unk/10.26033/3k3c-5713/2022-12-14T20:06:48+00:00/output.json

Root cause still being investigated - likely source is .../pds-doi-service/sync_dois.sh --prefix 10.26033

alexdunnjpl commented 1 year ago

syncing prefix 10.26033 on a fresh db results in correct outputs stored in the transaction history, and no error when running the dump.

Root cause likely fixed in #339

alexdunnjpl commented 1 year ago

Per @rsjoyner, ~7am, dump is failing again.

Inspection reveals 3k3c-5713 contains unescaped quotes again, and has a modification timestamp of midnight last night, so was presumably overwritten by a cron again.

Need to confirm version of doi-service in use on pds-prod1 and ensure it actually has #339

Deployed version is v2.1.3, fix requires v2.2.1 or later (v2.3.5 latest)

alexdunnjpl commented 1 year ago

As an aside, fixing the stored data enables pds-doi-cmd list -start 2022-12-14 -end 2022-12-15 -f label > DataCite_dump_results_junk.json to run, but the output from this command is not valid json and contains unescaped double-quotes within values. It's unclear why, but this requires further investigation.

This is probably fixed as part of #339 as well, as I can't replicate locally.

alexdunnjpl commented 1 year ago

Confirmed:

miguelp1986 commented 1 year ago

@alexdunnjpl @jordanpadams I'm attempting to test this, and I cannot get pds-doi-cmd to generate any queries from the command:

 pds-doi-cmd list -start 1990-01-01 -end 2022-12-27 -f label > ../../DataCit_dump.json 

INFO pds_doi_service.core.util.logging:_get_config Searching for configuration files from candidates ['/Users/MPena/repositories/doi-service-2.3.6/src/pds_doi_service/core/util/conf.default.ini', '/Users/MPena/repositories/doi-service-2.3.6/venv/pds_doi_service.ini']
INFO pds_doi_service.core.util.logging:_get_config Using configs (with later files overwriting previous files' values): ['/Users/MPena/repositories/doi-service-2.3.6/src/pds_doi_service/core/util/conf.default.ini', '/Users/MPena/repositories/doi-service-2.3.6/venv/pds_doi_service.ini']
INFO pds_doi_service.core.cmd.pds_doi_cmd:main run_dir /Users/MPena/repositories/doi-service-2.3.6/venv/bin
DEBUG pds_doi_service.core.outputs.service:get_doi_record_service Returning instance of DOIDataCiteRecord for service type datacite
DEBUG pds_doi_service.core.outputs.service:get_web_parser_service Returning instance of DOIDataCiteWebParser for service type datacite
DEBUG pds_doi_service.core.actions.action:parse_arguments format = label
DEBUG pds_doi_service.core.actions.action:parse_arguments doi = 10.12345/abcdef
DEBUG pds_doi_service.core.actions.action:parse_arguments ids = None
DEBUG pds_doi_service.core.actions.action:parse_arguments node = None
DEBUG pds_doi_service.core.actions.action:parse_arguments status = None
DEBUG pds_doi_service.core.actions.action:parse_arguments start_update = None
DEBUG pds_doi_service.core.actions.action:parse_arguments end_update = None
DEBUG pds_doi_service.core.actions.action:parse_arguments submitter = None
INFO pds_doi_service.core.db.doi_database:create_connection Connecting to SQLite3 (ver 2.6.0) database /Users/MPena/repositories/doi-service-2.3.6/venv/doi.db
INFO pds_doi_service.core.db.doi_database:check_if_table_exists Checking for existence of DOI table doi
INFO pds_doi_service.core.db.doi_database:check_if_table_exists Executing query: SELECT count(name) FROM sqlite_master WHERE type='table' AND name='doi'
DEBUG pds_doi_service.core.db.doi_database:check_if_table_exists o_table_exists_flag: True
DEBUG pds_doi_service.core.db.doi_database:parse_criteria Calling get_query_criteria_doi with value ['10.12345/abcdef']
DEBUG pds_doi_service.core.db.doi_database:_form_query_with_wildcards WHERE subclause: AND (doi IN (:doi_0) )
DEBUG pds_doi_service.core.db.doi_database:parse_criteria criteria_str: AND (doi IN (:doi_0) )
DEBUG pds_doi_service.core.db.doi_database:parse_criteria dict_entry: {'doi_0': '10.12345/abcdef'}
DEBUG pds_doi_service.core.db.doi_database:select_latest_rows SELECT query_string: SELECT * from doi WHERE is_latest=1 AND (doi IN (:doi_0) ) ORDER BY date_updated
DEBUG pds_doi_service.core.db.doi_database:select_latest_rows Query returned 0 result(s)

I'm not getting the JSON encoding error, but since it's not generating any data, I can't determine if this passes the test. I'm pretty unfamiliar with this tool, so I am may missing something simple. Any help would be appreciated. Thank you.