Closed tloubrieu-jpl closed 10 months ago
I would consider changing the priority for this ticket https://github.com/NASA-PDS/doi-service/issues/8 to implement the deactivation of a DOI.
The other options I looked at, but dismissed are:
Question for @alexdunnjpl and @collinss-jpl , if we assign a 'deactivated' status at a DOI in a local doi.db sqllite database, it is not going to be overwritten by the synchronization happening daily because the record at DataCite has not been updated. Is that correct ?
@tloubrieu-jpl hard to say - I'd need to test it out to be sure but based on my memory of how it's supposed to work, that seems plausible.
@tloubrieu-jpl will check the gamma deployment to see why this dois are sent in the report and "remove" them by changing their status.
@rsjoyner @c-suh @jordanpadams I confirm the dois Ron is seeing in his daily reports come from pdscloud-gamma where their status is 'review' (instead of 'findable' in production)
3 questions so far:
1) I am seeing that on pds-gamma we are daily synchronizing all the DOI from the production prefix "10.17189" with a command in crontab. With a naive view (where I forgot everything I've done in the past with the DOI service), it sounds weird that we are importing production records in a test database. Can you tell me again why we do that ? Or if we should not ?
2) I don't understand why the status is 'review' in the local gamma database whereas it is findable in the datacite system.
3) I tried to update the status of the record with command but that was not successful because the gamma deployment is only authorized to work with the test dataCite prefix. That is what I am guessing.
$ pds-doi-cmd release -i /data/home/pds4/pds-doi-service/transaction_history/eng/10.17189/n0dm-0014/2022-07-29T00\:43\:45.273306+00\:00/output.json --no-review --submitter loubrieu@jpl.nasa.gov
INFO pds_doi_service.core.util.logging:_get_config Searching for configuration files from candidates ['/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/util/conf.default.ini', '/data/home/pds4/pds-doi-service/pds_doi_service.ini']
INFO pds_doi_service.core.util.logging:_get_config Using configs (with later files overwriting previous files' values): ['/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/util/conf.default.ini', '/data/home/pds4/pds-doi-service/pds_doi_service.ini']
INFO pds_doi_service.core.cmd.pds_doi_cmd:main run_dir /data/home/pds4
INFO pds_doi_service.core.input.input_util:_read_from_path Reading local file path /data/home/pds4/pds-doi-service/transaction_history/eng/10.17189/n0dm-0014/2022-07-29T00:43:45.273306+00:00/output.json
INFO pds_doi_service.core.input.input_util:parse_json_file Parsing json file output.json
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 0
WARNING pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Record 0: Could not parse optional field "rights_list"
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsed 1 DOI objects from 1 records
INFO pds_doi_service.core.db.doi_database:create_connection Connecting to SQLite3 (ver 2.6.0) database /data/home/pds4/pds-doi-service/doi.db
INFO pds_doi_service.core.db.doi_database:check_if_table_exists Checking for existence of DOI table doi
INFO pds_doi_service.core.db.doi_database:check_if_table_exists Executing query: SELECT count(name) FROM sqlite_master WHERE type='table' AND name='doi'
INFO pds_doi_service.core.outputs.doi_validator:_check_field_site_url Landing page URL https://pds.nasa.gov/ds-view/pds/viewBundle.jsp?identifier=urn%3Anasa%3Apds%3Agalileo-epd-cal-corrected&version=1.0 is reachable
Traceback (most recent call last):
File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/outputs/web_client.py", line 89, in _submit_content
response.raise_for_status()
File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/requests/models.py", line 960, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://api.test.datacite.org/dois/10.17189/n0dm-0014
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/actions/release.py", line 286, in run
output_doi, o_doi_label = self._web_client.submit_content(
File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/outputs/datacite/datacite_web_client.py", line 93, in submit_content
response_text = super()._submit_content(
File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/outputs/web_client.py", line 95, in _submit_content
raise WebRequestException(
pds_doi_service.core.entities.exceptions.WebRequestException: DOI submission request to DataCite service failed, reason: 403 Client Error: Forbidden for url: https://api.test.datacite.org/dois/10.17189/n0dm-0014
Details: ('{"errors":[{"status":"403","title":"You are not authorized to access this '
'resource."}]}')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/home/pds4/pds-doi-service/bin/pds-doi-cmd", line 8, in <module>
sys.exit(main())
File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/cmd/pds_doi_cmd.py", line 42, in main
output = action.run(**kwargs)
File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/actions/release.py", line 322, in run
raise CriticalDOIException(str(err))
pds_doi_service.core.entities.exceptions.CriticalDOIException: DOI submission request to DataCite service failed, reason: 403 Client Error: Forbidden for url: https://api.test.datacite.org/dois/10.17189/n0dm-0014
Details: ('{"errors":[{"status":"403","title":"You are not authorized to access this '
'resource."}]}')
So I am going next to investigate why the status in the local database does not match with the status at DataCite (2).
@collinss-jpl @alexdunnjpl any thoughts on my comment above ?
My memory is also a bit fuzzy on this, but here are my answers
For question 1: it is a bit odd that we pull the production DOI's into the gamma database. Maybe we set this up as a way to test the synchronization script before we had DOI's available in the production Datacite server. If the synchronization is now occurring for real with our production DOI service, we could probably disable the crontab on gamma.
For question 2: I noticed from your traceback that the doi service deployment on gamma is configured to talk to https://api.test.datacite.org
, rather than the actual production datacite API (https://api.datacite.org/dois/
). This could explain why the DOI in production is findable, whereas on gamma it looks like its still in review.
For question 3: When I try to do a GET on https://api.test.datacite.org/dois/10.17189/n0dm-0014
via my browser, I get a 404 back meaning the record does not actually exist in the test datacite environment. This is probably why you get a 403 back when trying to make an update to the record. Since the DOI does not actually exist in the test Datacite environment, its probably safe to just purge the record in the local database on gamma.
Thanks @collinss-jpl You are right we need to clarify why we are importing production doi in our pre-prod database.
For the question 2, I am thinking the json format of these records might be corrupted in a way it cannot be imported by our synchronization code to the local database.
I tried to import the json that I copied from dataCite manually:
$ pds-doi-cmd release -i ~/tmp/doi_pb.json --submitter loubrieu@jpl.nasa.gov --no-review
INFO pds_doi_service.core.util.logging:_get_config Searching for configuration files from candidates ['/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/util/conf.default.ini', '/data/home/pds4/pds-doi-service/pds_doi_service.ini']
INFO pds_doi_service.core.util.logging:_get_config Using configs (with later files overwriting previous files' values): ['/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/util/conf.default.ini', '/data/home/pds4/pds-doi-service/pds_doi_service.ini']
INFO pds_doi_service.core.cmd.pds_doi_cmd:main run_dir /data/home/pds4/pds-doi-service
INFO pds_doi_service.core.input.input_util:_read_from_path Reading local file path /home/pds4/tmp/doi_pb.json
INFO pds_doi_service.core.input.input_util:parse_json_file Parsing json file doi_pb.json
WARNING pds_doi_service.core.input.input_util:parse_json_file Unable to parse DOI objects from provided json file "/home/pds4/tmp/doi_pb.json"
Reason: JSON record at index 0 does not appear to be in DataCite format.
Please ensure the label is valid DataCite JSON (as opposed to OSTI-format).
Traceback (most recent call last):
File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/input/input_util.py", line 488, in parse_json_file
validator.validate(json_contents)
File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/outputs/datacite/datacite_validator.py", line 109, in validate
raise InputFormatException(error_message)
pds_doi_service.core.entities.exceptions.InputFormatException: JSON record at index 0 does not appear to be in DataCite format.
Please ensure the label is valid DataCite JSON (as opposed to OSTI-format).
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/home/pds4/pds-doi-service/bin/pds-doi-cmd", line 8, in <module>
sys.exit(main())
File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/cmd/pds_doi_cmd.py", line 42, in main
output = action.run(**kwargs)
File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/actions/release.py", line 313, in run
raise err
File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/actions/release.py", line 272, in run
dois = self._parse_input(self._input)
File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/actions/release.py", line 129, in _parse_input
return self._input_util.parse_dois_from_input_file(input_file)
File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/input/input_util.py", line 640, in parse_dois_from_input_file
dois = self._read_from_path(input_file)
File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/input/input_util.py", line 533, in _read_from_path
dois = read_function(path)
File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/input/input_util.py", line 494, in parse_json_file
raise InputFormatException(msg)
pds_doi_service.core.entities.exceptions.InputFormatException: Unable to parse DOI objects from provided json file "/home/pds4/tmp/doi_pb.json"
Reason: JSON record at index 0 does not appear to be in DataCite format.
Please ensure the label is valid DataCite JSON (as opposed to OSTI-format).
Oh no, I think we need a wrapper around the DOI record. Let me try that.
@tloubrieu-jpl will drop the DOI database on gamma.
@tloubrieu-jpl just got back - ping me if you need further action/investigation from me
I made the re-initialization of the gamma doi database and there is now not any in review doi left, so @rsjoyner's daily report should be empty.
💡 Description
See Ron's email:
Corrrection: All records greater than: "date_added": "2022-07-28…
RJ
From: Joyner, Ronald (US 398G) <> Sent: Friday, August 4, 2023 8:12 AM To: Loubrieu, Thomas G (US 398F) [thomas.g.loubrieu@jpl.nasa.gov](mailto:thomas.g.loubrieu@jpl.nasa.gov) Subject: RE: DOI daily review on pdscloud-gamma
Howdy,
All records greater than: "date_added": "2022-07-29…
From: Loubrieu, Thomas G (US 398F) [thomas.g.loubrieu@jpl.nasa.gov](mailto:thomas.g.loubrieu@jpl.nasa.gov) Sent: Friday, August 4, 2023 6:45 AM To: Joyner, Ronald (US 398G) [ronald.joyner@jpl.nasa.gov](mailto:ronald.joyner@jpl.nasa.gov) Subject: Re: DOI daily review on pdscloud-gamma
Hi Ron,
I could remove manually each entry that you sent but would you have a criteria (eg time of last update) on which entry should be removed. I don t think I had an answer from you to my email here.
Thanks,
Thomas
Get Outlook for iOS From: Loubrieu, Thomas G (US 398F) [thomas.g.loubrieu@jpl.nasa.gov](mailto:thomas.g.loubrieu@jpl.nasa.gov) Sent: Thursday, July 20, 2023 12:15:59 PM To: Joyner, Ronald (US 398G) [ronald.joyner@jpl.nasa.gov](mailto:ronald.joyner@jpl.nasa.gov) Subject: Re: DOI daily review on pdscloud-gamma
Hi Ron,
I was thinking of 2 options, either delete the record from the database, or assign a new status ‘obsolete’ or something like that. In the future, we could have an administration function, with a command line to do that. What would be the criteria to give up on DOI records ? Or would that be done individually on each DOI ?
Thanks,
Thomas
From: Joyner, Ronald (US 398G) [ronald.joyner@jpl.nasa.gov](mailto:ronald.joyner@jpl.nasa.gov) Date: Thursday, July 20, 2023 at 6:32 AM To: Loubrieu, Thomas G (US 398F) [thomas.g.loubrieu@jpl.nasa.gov](mailto:thomas.g.loubrieu@jpl.nasa.gov) Subject: FW: DOI daily review on pdscloud-gamma
Howdy,
Hey Thomas. Can you please purge these records. I still want the daily email. But, these records are way old and I want a fresh start. Stay tuned for a 2nd email from a 2nd account that also needs to be purged.
Thanks RJ
-----Original Message----- From: pds4@ip-10-100-1-97.localdomain [pds4@ip-10-100-1-97.localdomain](mailto:pds4@ip-10-100-1-97.localdomain) Sent: Thursday, July 20, 2023 12:00 AM To: Joyner, Ronald (US 398G) [ronald.joyner@jpl.nasa.gov](mailto:ronald.joyner@jpl.nasa.gov); pdsen-operator@jpl.nasa.gov Subject: DOI daily review on pdscloud-gamma
[{"doi": "10.17189/btz6-5a82", "identifier": "urn:nasa:pds:mars2020_rover_places::3.0", "status": "review", "title": "Mars 2020 Rover PLACES Bundle", "submitter": "loubrieu@jpl.nasa.gov", "type": "Collection", "subtype": "PDS4 Refereed Data Bundle", "node_id": "eng", "date_added": "2022-07-29T00:21:16.004772+00:00", "date_updated": "2022-07-29T00:21:16.004772+00:00", "transaction_key": "/data/home/pds4/pds-doi-service/transaction_history/eng/10.17189/btz6-5a82/2022-07-29T00:21:16.004772+00:00", "is_latest": true}, {"doi": "10.17189/n0dm-0014", "identifier": "urn:nasa:pds:galileo-epd-cal-corrected::1.0", "status": "review", "title": "Galileo EPD Calibrated Corrected Data Bundle", "submitter": "Vivian.Tang@jpl.nasa.gov", "type": "Bundle", "subtype": "PDS4 Refereed Data Bundle", "node_id": "eng", "date_added": "2022-07-29T00:43:45.273306+00:00", "date_updated": "2022-07-29T00:43:45.273306+00:00", "transaction_key": "/data/home/pds4/pds-doi-service/transaction_history/eng/10.17189/n0dm-0014/2022-07-29T00:43:45.273306+00:00", "is_latest": true}, {"doi": "10.17189/6skx-3c53", "identifier": "PVO-V-OMAG-4--SCCOORDS-24S-V2.0b", "status": "review", "title": "PVO VENUS MAG RESAMPLED SC COORDS 24SEC AVGS V2.0", "submitter": "rsjoyner@jpl.nasa.gov", "type": "Collection", "subtype": "PDS3 Data Set", "node_id": "ppi", "date_added": "2022-10-31T18:27:51+00:00", "date_updated": "2022-10-31T18:27:51+00:00", "transaction_key": "/data/home/pds4/pds-doi-service/transaction_history/ppi/10.17189/6skx-3c53/2022-10-31T18:27:51+00:00", "is_latest": true}, {"doi": "10.17189/awd9-v380", "identifier": "PVO-V-OMAG-3-P-SENSOR-HIRES-V2.0", "status": "review", "title": "PVO VENUS MAG CALIBRATED P-SENSOR HIGH RES V2.0", "submitter": "rsjoyner@jpl.nasa.gov", "type": "Collection", "subtype": "PDS3 Data Set", "node_id": "ppi", "date_added": "2022-10-31T18:27:51+00:00", "date_updated": "2022-10-31T18:27:51+00:00", "transaction_key": "/data/home/pds4/pds-doi-service/transaction_history/ppi/10.17189/awd9-v380/2022-10-31T18:27:51+00:00", "is_latest": true}, {"doi": "10.17189/hkep-8z69", "identifier": "PVO-V-OMAG-4-P-SENSOR-24SEC-V2.0", "status": "review", "title": "PVO VENUS MAG RESAMPLED P-SENSOR 24SEC AVGS V2.0", "submitter": "rsjoyner@jpl.nasa.gov", "type": "Collection", "subtype": "PDS3 Data Set", "node_id": "ppi", "date_added": "2022-10-31T18:27:52+00:00", "date_updated": "2022-10-31T18:27:52+00:00", "transaction_key": "/data/home/pds4/pds-doi-service/transaction_history/ppi/10.17189/hkep-8z69/2022-10-31T18:27:52+00:00", "is_latest": true}]