NASA-PDS / doi-service

Service and tools for generating DOIs for PDS bundles, collections, and data sets
https://nasa-pds.github.io/doi-service
Other
2 stars 3 forks source link

Purge doi.db from test dois #411

Closed tloubrieu-jpl closed 10 months ago

tloubrieu-jpl commented 11 months ago

💡 Description

See Ron's email:

Corrrection: All records greater than: "date_added": "2022-07-28…

RJ

From: Joyner, Ronald (US 398G) <> Sent: Friday, August 4, 2023 8:12 AM To: Loubrieu, Thomas G (US 398F) [thomas.g.loubrieu@jpl.nasa.gov](mailto:thomas.g.loubrieu@jpl.nasa.gov) Subject: RE: DOI daily review on pdscloud-gamma

Howdy,

sent but would you have a criteria (eg time of last update)

All records greater than: "date_added": "2022-07-29…

From: Loubrieu, Thomas G (US 398F) [thomas.g.loubrieu@jpl.nasa.gov](mailto:thomas.g.loubrieu@jpl.nasa.gov) Sent: Friday, August 4, 2023 6:45 AM To: Joyner, Ronald (US 398G) [ronald.joyner@jpl.nasa.gov](mailto:ronald.joyner@jpl.nasa.gov) Subject: Re: DOI daily review on pdscloud-gamma

Hi Ron,

I could remove manually each entry that you sent but would you have a criteria (eg time of last update) on which entry should be removed. I don t think I had an answer from you to my email here.

Thanks,

Thomas

Get Outlook for iOS From: Loubrieu, Thomas G (US 398F) [thomas.g.loubrieu@jpl.nasa.gov](mailto:thomas.g.loubrieu@jpl.nasa.gov) Sent: Thursday, July 20, 2023 12:15:59 PM To: Joyner, Ronald (US 398G) [ronald.joyner@jpl.nasa.gov](mailto:ronald.joyner@jpl.nasa.gov) Subject: Re: DOI daily review on pdscloud-gamma

Hi Ron,

I was thinking of 2 options, either delete the record from the database, or assign a new status ‘obsolete’ or something like that. In the future, we could have an administration function, with a command line to do that. What would be the criteria to give up on DOI records ? Or would that be done individually on each DOI ?

Thanks,

Thomas

From: Joyner, Ronald (US 398G) [ronald.joyner@jpl.nasa.gov](mailto:ronald.joyner@jpl.nasa.gov) Date: Thursday, July 20, 2023 at 6:32 AM To: Loubrieu, Thomas G (US 398F) [thomas.g.loubrieu@jpl.nasa.gov](mailto:thomas.g.loubrieu@jpl.nasa.gov) Subject: FW: DOI daily review on pdscloud-gamma

Howdy,

Hey Thomas. Can you please purge these records. I still want the daily email. But, these records are way old and I want a fresh start. Stay tuned for a 2nd email from a 2nd account that also needs to be purged.

Thanks RJ

-----Original Message----- From: pds4@ip-10-100-1-97.localdomain [pds4@ip-10-100-1-97.localdomain](mailto:pds4@ip-10-100-1-97.localdomain) Sent: Thursday, July 20, 2023 12:00 AM To: Joyner, Ronald (US 398G) [ronald.joyner@jpl.nasa.gov](mailto:ronald.joyner@jpl.nasa.gov); pdsen-operator@jpl.nasa.gov Subject: DOI daily review on pdscloud-gamma

[{"doi": "10.17189/btz6-5a82", "identifier": "urn:nasa:pds:mars2020_rover_places::3.0", "status": "review", "title": "Mars 2020 Rover PLACES Bundle", "submitter": "loubrieu@jpl.nasa.gov", "type": "Collection", "subtype": "PDS4 Refereed Data Bundle", "node_id": "eng", "date_added": "2022-07-29T00:21:16.004772+00:00", "date_updated": "2022-07-29T00:21:16.004772+00:00", "transaction_key": "/data/home/pds4/pds-doi-service/transaction_history/eng/10.17189/btz6-5a82/2022-07-29T00:21:16.004772+00:00", "is_latest": true}, {"doi": "10.17189/n0dm-0014", "identifier": "urn:nasa:pds:galileo-epd-cal-corrected::1.0", "status": "review", "title": "Galileo EPD Calibrated Corrected Data Bundle", "submitter": "Vivian.Tang@jpl.nasa.gov", "type": "Bundle", "subtype": "PDS4 Refereed Data Bundle", "node_id": "eng", "date_added": "2022-07-29T00:43:45.273306+00:00", "date_updated": "2022-07-29T00:43:45.273306+00:00", "transaction_key": "/data/home/pds4/pds-doi-service/transaction_history/eng/10.17189/n0dm-0014/2022-07-29T00:43:45.273306+00:00", "is_latest": true}, {"doi": "10.17189/6skx-3c53", "identifier": "PVO-V-OMAG-4--SCCOORDS-24S-V2.0b", "status": "review", "title": "PVO VENUS MAG RESAMPLED SC COORDS 24SEC AVGS V2.0", "submitter": "rsjoyner@jpl.nasa.gov", "type": "Collection", "subtype": "PDS3 Data Set", "node_id": "ppi", "date_added": "2022-10-31T18:27:51+00:00", "date_updated": "2022-10-31T18:27:51+00:00", "transaction_key": "/data/home/pds4/pds-doi-service/transaction_history/ppi/10.17189/6skx-3c53/2022-10-31T18:27:51+00:00", "is_latest": true}, {"doi": "10.17189/awd9-v380", "identifier": "PVO-V-OMAG-3-P-SENSOR-HIRES-V2.0", "status": "review", "title": "PVO VENUS MAG CALIBRATED P-SENSOR HIGH RES V2.0", "submitter": "rsjoyner@jpl.nasa.gov", "type": "Collection", "subtype": "PDS3 Data Set", "node_id": "ppi", "date_added": "2022-10-31T18:27:51+00:00", "date_updated": "2022-10-31T18:27:51+00:00", "transaction_key": "/data/home/pds4/pds-doi-service/transaction_history/ppi/10.17189/awd9-v380/2022-10-31T18:27:51+00:00", "is_latest": true}, {"doi": "10.17189/hkep-8z69", "identifier": "PVO-V-OMAG-4-P-SENSOR-24SEC-V2.0", "status": "review", "title": "PVO VENUS MAG RESAMPLED P-SENSOR 24SEC AVGS V2.0", "submitter": "rsjoyner@jpl.nasa.gov", "type": "Collection", "subtype": "PDS3 Data Set", "node_id": "ppi", "date_added": "2022-10-31T18:27:52+00:00", "date_updated": "2022-10-31T18:27:52+00:00", "transaction_key": "/data/home/pds4/pds-doi-service/transaction_history/ppi/10.17189/hkep-8z69/2022-10-31T18:27:52+00:00", "is_latest": true}]

tloubrieu-jpl commented 10 months ago

I would consider changing the priority for this ticket https://github.com/NASA-PDS/doi-service/issues/8 to implement the deactivation of a DOI.

The other options I looked at, but dismissed are:

Question for @alexdunnjpl and @collinss-jpl , if we assign a 'deactivated' status at a DOI in a local doi.db sqllite database, it is not going to be overwritten by the synchronization happening daily because the record at DataCite has not been updated. Is that correct ?

alexdunnjpl commented 10 months ago

@tloubrieu-jpl hard to say - I'd need to test it out to be sure but based on my memory of how it's supposed to work, that seems plausible.

tloubrieu-jpl commented 10 months ago

@tloubrieu-jpl will check the gamma deployment to see why this dois are sent in the report and "remove" them by changing their status.

tloubrieu-jpl commented 10 months ago

@rsjoyner @c-suh @jordanpadams I confirm the dois Ron is seeing in his daily reports come from pdscloud-gamma where their status is 'review' (instead of 'findable' in production)

3 questions so far:

1) I am seeing that on pds-gamma we are daily synchronizing all the DOI from the production prefix "10.17189" with a command in crontab. With a naive view (where I forgot everything I've done in the past with the DOI service), it sounds weird that we are importing production records in a test database. Can you tell me again why we do that ? Or if we should not ?

2) I don't understand why the status is 'review' in the local gamma database whereas it is findable in the datacite system.

3) I tried to update the status of the record with command but that was not successful because the gamma deployment is only authorized to work with the test dataCite prefix. That is what I am guessing.

$ pds-doi-cmd release -i /data/home/pds4/pds-doi-service/transaction_history/eng/10.17189/n0dm-0014/2022-07-29T00\:43\:45.273306+00\:00/output.json --no-review --submitter loubrieu@jpl.nasa.gov
INFO pds_doi_service.core.util.logging:_get_config Searching for configuration files from candidates ['/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/util/conf.default.ini', '/data/home/pds4/pds-doi-service/pds_doi_service.ini']
INFO pds_doi_service.core.util.logging:_get_config Using configs (with later files overwriting previous files' values): ['/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/util/conf.default.ini', '/data/home/pds4/pds-doi-service/pds_doi_service.ini']
INFO pds_doi_service.core.cmd.pds_doi_cmd:main run_dir /data/home/pds4
INFO pds_doi_service.core.input.input_util:_read_from_path Reading local file path /data/home/pds4/pds-doi-service/transaction_history/eng/10.17189/n0dm-0014/2022-07-29T00:43:45.273306+00:00/output.json
INFO pds_doi_service.core.input.input_util:parse_json_file Parsing json file output.json
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 0
WARNING pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Record 0: Could not parse optional field "rights_list"
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsed 1 DOI objects from 1 records
INFO pds_doi_service.core.db.doi_database:create_connection Connecting to SQLite3 (ver 2.6.0) database /data/home/pds4/pds-doi-service/doi.db
INFO pds_doi_service.core.db.doi_database:check_if_table_exists Checking for existence of DOI table doi
INFO pds_doi_service.core.db.doi_database:check_if_table_exists Executing query: SELECT count(name) FROM sqlite_master WHERE type='table' AND name='doi'
INFO pds_doi_service.core.outputs.doi_validator:_check_field_site_url Landing page URL https://pds.nasa.gov/ds-view/pds/viewBundle.jsp?identifier=urn%3Anasa%3Apds%3Agalileo-epd-cal-corrected&version=1.0 is reachable
Traceback (most recent call last):
  File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/outputs/web_client.py", line 89, in _submit_content
    response.raise_for_status()
  File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/requests/models.py", line 960, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://api.test.datacite.org/dois/10.17189/n0dm-0014

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/actions/release.py", line 286, in run
    output_doi, o_doi_label = self._web_client.submit_content(
  File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/outputs/datacite/datacite_web_client.py", line 93, in submit_content
    response_text = super()._submit_content(
  File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/outputs/web_client.py", line 95, in _submit_content
    raise WebRequestException(
pds_doi_service.core.entities.exceptions.WebRequestException: DOI submission request to DataCite service failed, reason: 403 Client Error: Forbidden for url: https://api.test.datacite.org/dois/10.17189/n0dm-0014
Details: ('{"errors":[{"status":"403","title":"You are not authorized to access this '
 'resource."}]}')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/home/pds4/pds-doi-service/bin/pds-doi-cmd", line 8, in <module>
    sys.exit(main())
  File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/cmd/pds_doi_cmd.py", line 42, in main
    output = action.run(**kwargs)
  File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/actions/release.py", line 322, in run
    raise CriticalDOIException(str(err))
pds_doi_service.core.entities.exceptions.CriticalDOIException: DOI submission request to DataCite service failed, reason: 403 Client Error: Forbidden for url: https://api.test.datacite.org/dois/10.17189/n0dm-0014
Details: ('{"errors":[{"status":"403","title":"You are not authorized to access this '
 'resource."}]}')

So I am going next to investigate why the status in the local database does not match with the status at DataCite (2).

tloubrieu-jpl commented 10 months ago

@collinss-jpl @alexdunnjpl any thoughts on my comment above ?

collinss-jpl commented 10 months ago

My memory is also a bit fuzzy on this, but here are my answers

For question 1: it is a bit odd that we pull the production DOI's into the gamma database. Maybe we set this up as a way to test the synchronization script before we had DOI's available in the production Datacite server. If the synchronization is now occurring for real with our production DOI service, we could probably disable the crontab on gamma.

For question 2: I noticed from your traceback that the doi service deployment on gamma is configured to talk to https://api.test.datacite.org, rather than the actual production datacite API (https://api.datacite.org/dois/). This could explain why the DOI in production is findable, whereas on gamma it looks like its still in review.

For question 3: When I try to do a GET on https://api.test.datacite.org/dois/10.17189/n0dm-0014 via my browser, I get a 404 back meaning the record does not actually exist in the test datacite environment. This is probably why you get a 403 back when trying to make an update to the record. Since the DOI does not actually exist in the test Datacite environment, its probably safe to just purge the record in the local database on gamma.

tloubrieu-jpl commented 10 months ago

Thanks @collinss-jpl You are right we need to clarify why we are importing production doi in our pre-prod database.

For the question 2, I am thinking the json format of these records might be corrupted in a way it cannot be imported by our synchronization code to the local database.

I tried to import the json that I copied from dataCite manually:


$ pds-doi-cmd release -i ~/tmp/doi_pb.json --submitter loubrieu@jpl.nasa.gov --no-review
INFO pds_doi_service.core.util.logging:_get_config Searching for configuration files from candidates ['/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/util/conf.default.ini', '/data/home/pds4/pds-doi-service/pds_doi_service.ini']
INFO pds_doi_service.core.util.logging:_get_config Using configs (with later files overwriting previous files' values): ['/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/util/conf.default.ini', '/data/home/pds4/pds-doi-service/pds_doi_service.ini']
INFO pds_doi_service.core.cmd.pds_doi_cmd:main run_dir /data/home/pds4/pds-doi-service
INFO pds_doi_service.core.input.input_util:_read_from_path Reading local file path /home/pds4/tmp/doi_pb.json
INFO pds_doi_service.core.input.input_util:parse_json_file Parsing json file doi_pb.json
WARNING pds_doi_service.core.input.input_util:parse_json_file Unable to parse DOI objects from provided json file "/home/pds4/tmp/doi_pb.json"
Reason: JSON record at index 0 does not appear to be in DataCite format.
Please ensure the label is valid DataCite JSON (as opposed to OSTI-format).
Traceback (most recent call last):
  File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/input/input_util.py", line 488, in parse_json_file
    validator.validate(json_contents)
  File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/outputs/datacite/datacite_validator.py", line 109, in validate
    raise InputFormatException(error_message)
pds_doi_service.core.entities.exceptions.InputFormatException: JSON record at index 0 does not appear to be in DataCite format.
Please ensure the label is valid DataCite JSON (as opposed to OSTI-format).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/home/pds4/pds-doi-service/bin/pds-doi-cmd", line 8, in <module>
    sys.exit(main())
  File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/cmd/pds_doi_cmd.py", line 42, in main
    output = action.run(**kwargs)
  File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/actions/release.py", line 313, in run
    raise err
  File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/actions/release.py", line 272, in run
    dois = self._parse_input(self._input)
  File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/actions/release.py", line 129, in _parse_input
    return self._input_util.parse_dois_from_input_file(input_file)
  File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/input/input_util.py", line 640, in parse_dois_from_input_file
    dois = self._read_from_path(input_file)
  File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/input/input_util.py", line 533, in _read_from_path
    dois = read_function(path)
  File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/input/input_util.py", line 494, in parse_json_file
    raise InputFormatException(msg)
pds_doi_service.core.entities.exceptions.InputFormatException: Unable to parse DOI objects from provided json file "/home/pds4/tmp/doi_pb.json"
Reason: JSON record at index 0 does not appear to be in DataCite format.
Please ensure the label is valid DataCite JSON (as opposed to OSTI-format).
tloubrieu-jpl commented 10 months ago

Oh no, I think we need a wrapper around the DOI record. Let me try that.

tloubrieu-jpl commented 10 months ago

@tloubrieu-jpl will drop the DOI database on gamma.

alexdunnjpl commented 10 months ago

@tloubrieu-jpl just got back - ping me if you need further action/investigation from me

tloubrieu-jpl commented 10 months ago

I made the re-initialization of the gamma doi database and there is now not any in review doi left, so @rsjoyner's daily report should be empty.