NASA-PDS / doi-service

Service and tools for generating DOIs for PDS bundles, collections, and data sets
https://nasa-pds.github.io/doi-service
Other
2 stars 3 forks source link

Valid PDS4 xml input is converted into an invalid json that fails internal datacite validator #328

Closed tloubrieu-jpl closed 2 years ago

tloubrieu-jpl commented 2 years ago

πŸ› Describe the bug

The error israised from a json labels which is generated internally. The error message does not give a clue where the error is coming from in the input xml file.

πŸ“œ To Reproduce

Steps to reproduce the behavior:

Failure 1 pds-doi-cmd release -s loubrieu@jpl.nasa.gov -N eng -i ~/tmp/bundle_edited.xml --force

bundle_edited2.xml.zip

Failure 2 galileo-epd-cal-corrected.zip

$ pds-doi-cmd release -N ppi -s [rsjoyner@jpl.nasa.gov](mailto:rsjoyner@jpl.nasa.gov) -i /home/pds4/input/PPI/Bundles_20220616/ --force --no-review > /home/pds4/result_activate_no_review_PPI_EDP_CAL_20220616.json
INFO pds_doi_service.core.cmd.pds_doi_cmd:main run_dir /home/pds4
INFO pds_doi_service.core.input.input_util:_read_from_path Reading files within directory /home/pds4/input/PPI/Bundles_20220616/
INFO pds_doi_service.core.input.input_util:_read_from_path Reading local file path /home/pds4/input/PPI/Bundles_20220616/bundle_galileo-epd-cal-corrected.xml
INFO pds_doi_service.core.input.input_util:parse_xml_file Parsing xml file bundle_galileo-epd-cal-corrected.xml as a PSD4 label
INFO pds_doi_service.core.input.input_util:_read_from_path Reading local file path /home/pds4/input/PPI/Bundles_20220616/collection-data-cms-events-deltaexe.xml
INFO pds_doi_service.core.input.input_util:parse_xml_file Parsing xml file collection-data-cms-events-deltaexe.xml as a PSD4 label
INFO pds_doi_service.core.input.input_util:_read_from_path Reading local file path /home/pds4/input/PPI/Bundles_20220616/collection-data-cms-events-tof.xml
INFO pds_doi_service.core.input.input_util:parse_xml_file Parsing xml file collection-data-cms-events-tof.xml as a PSD4 label
INFO pds_doi_service.core.input.input_util:_read_from_path Reading local file path /home/pds4/input/PPI/Bundles_20220616/collection-data-epd-channels-high-res.xml
INFO pds_doi_service.core.input.input_util:parse_xml_file Parsing xml file collection-data-epd-channels-high-res.xml as a PSD4 label
INFO pds_doi_service.core.input.input_util:_read_from_path Reading local file path /home/pds4/input/PPI/Bundles_20220616/collection-data-epd-channels-low-res.xml
INFO pds_doi_service.core.input.input_util:parse_xml_file Parsing xml file collection-data-epd-channels-low-res.xml as a PSD4 label
INFO pds_doi_service.core.input.input_util:_read_from_path Reading local file path /home/pds4/input/PPI/Bundles_20220616/collection-data-epd-channels-med-res.xml
INFO pds_doi_service.core.input.input_util:parse_xml_file Parsing xml file collection-data-epd-channels-med-res.xml as a PSD4 label
INFO pds_doi_service.core.db.doi_database:create_connection Connecting to SQLite3 (ver 2.6.0) database /home/pds4/pds-doi-service/doi.db
INFO pds_doi_service.core.db.doi_database:check_if_table_exists Checking for existence of DOI table doi
INFO pds_doi_service.core.db.doi_database:check_if_table_exists Executing query: SELECT count(name) FROM sqlite_master WHERE type='table' AND name='doi'
INFO pds_doi_service.core.outputs.doi_validator:_check_field_site_url Landing page URL https://pds.nasa.gov/ds-view/pds/viewBundle.jsp?identifier=urn%3Anasa%3Apds%3Agalileo-epd-cal-corrected&;version=1.0 is reachable
Traceback (most recent call last):
  File "/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/actions/release.py", line 274, in run
    dois = self._validate_dois(dois)
  File "/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/actions/release.py", line 214, in _validate_dois
    self._validator_service.validate(single_doi_label)
  File "/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/outputs/datacite/datacite_validator.py", line 73, in validate
    json_contents = json.loads(label_contents)
  File "/usr/local/python-3.9.5/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/local/python-3.9.5/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/python-3.9.5/lib/python3.9/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 142 column 578 (char 6545)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/pds4/pds-doi-service/bin/pds-doi-cmd", line 8, in <module>
    sys.exit(main())
  File "/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/cmd/pds_doi_cmd.py", line 42, in main
    output = action.run(**kwargs)
  File "/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/actions/release.py", line 322, in run
    raise CriticalDOIException(str(err))
pds_doi_service.core.entities.exceptions.CriticalDOIException: Expecting ',' delimiter: line 142 column 578 (char 6545)

πŸ•΅οΈ Expected behavior

We would like to catch the error when the json file is generated and not when it is parsed.

We might need to sanitize (remove or encode " for example) the strings which are written in the json file before it is sent.

πŸ“š Version of Software Used

v2.2.0

🩺 Test Data / Additional context

See above


πŸ¦„ Related requirements

πŸ¦„ https://github.com/NASA-PDS/doi-service/issues/7

βš™οΈ Engineering Details

tloubrieu-jpl commented 2 years ago

@rsjoyner this is the bug that I created to track what we said today. Thanks

tloubrieu-jpl commented 2 years ago

The solution for now will be to escape the quotation.

@nutjob4life will also create another ticket for a longer term solution where we build the json directly in python rather than using a template.

nutjob4life commented 2 years ago

Fix in pull request. Long term issue filed.

rsjoyner commented 2 years ago

From one of the comments herein, I was able to discern the "edits" that were needed to "force" the s/w to mint the DOIs.

I will point out that had the diagnostics been a bit (or a lot) more informative / useful, I would have hand edited the files long ago and moved on....

gxtchen commented 2 years ago

@tloubrieu-jpl @viviant100 I am still seeing the InputFormatException for the bundle_edited2.xml file after generating the Json file. test1.log

nutjob4life commented 2 years ago

@gxtchen this works for me with main branch as of 2022-07-12. See the following:

$ date -u
Tue Jul 12 13:26:16 UTC 2022
$ sw_vers
ProductName:    macOS
ProductVersion: 12.4
BuildVersion:   21F79
$ python3 --version
Python 3.9.13
$ git clone --quiet https://github.com/NASA-PDS/doi-service.git
$ cd doi-service
$ python3 -m venv venv
$ . venv/bin/activate
$ pip install --upgrade --quiet setuptools pip wheel build
$ pip install --quiet .
$ curl --silent --location --remote-name https://github.com/NASA-PDS/doi-service/files/8326880/bundle_edited2.xml.zip
$ unzip -q bundle_edited2.xml.zip
$ pds-doi-cmd release --submitter loubrieu@jpl.nasa.gov --node eng --input bundle_edited2.xml --force
INFO pds_doi_service.core.cmd.pds_doi_cmd:main run_dir /Users/kelly/Documents/Clients/JPL/PDS/Development/nasa-pds/doi-service
DEBUG pds_doi_service.core.outputs.service:get_doi_record_service Returning instance of DOIDataCiteRecord for service type datacite
DEBUG pds_doi_service.core.outputs.service:get_validator_service Returning instance of DOIDataCiteValidator for service type datacite
DEBUG pds_doi_service.core.outputs.service:get_web_client_service Returning instance of DOIDataCiteWebClient for service type datacite
DEBUG pds_doi_service.core.actions.action:parse_arguments input = bundle_edited2.xml
DEBUG pds_doi_service.core.actions.action:parse_arguments node = eng
DEBUG pds_doi_service.core.actions.action:parse_arguments submitter = loubrieu@jpl.nasa.gov
DEBUG pds_doi_service.core.actions.action:parse_arguments force = True
DEBUG pds_doi_service.core.actions.action:parse_arguments review = True
INFO pds_doi_service.core.input.input_util:_read_from_path Reading local file path bundle_edited2.xml
INFO pds_doi_service.core.input.input_util:parse_xml_file Parsing xml file bundle_edited2.xml as a PDS4 label
DEBUG pds_doi_service.core.input.pds4_util:get_names name_list ['Deen, Robert G.', ' Grimes, Kevin', ' Toole, Nicholas T.']
DEBUG pds_doi_service.core.input.pds4_util:get_names first_last_name_order (0, -1)
DEBUG pds_doi_service.core.input.pds4_util:_get_name_components parse full_name Deen, Robert G.
DEBUG pds_doi_service.core.input.pds4_util:_get_name_components parsed person {'first_name': 'Robert G.', 'last_name': 'Deen', 'affiliation': [], 'name_type': 'Personal'}
DEBUG pds_doi_service.core.input.pds4_util:_get_name_components parse full_name  Grimes, Kevin
DEBUG pds_doi_service.core.input.pds4_util:_get_name_components parsed person {'first_name': 'Grimes', 'last_name': 'Kevin', 'affiliation': [], 'name_type': 'Personal'}
DEBUG pds_doi_service.core.input.pds4_util:_get_name_components parse full_name  Toole, Nicholas T.
DEBUG pds_doi_service.core.input.pds4_util:_get_name_components parsed person {'first_name': 'Nicholas T.', 'last_name': 'Toole', 'affiliation': [], 'name_type': 'Personal'}
DEBUG pds_doi_service.core.input.pds4_util:_check_for_possible_full_name num_dots_found,num_person_names,len(names_list),names_list (1, 1, 2, ['Deen', ' Robert G.'])
DEBUG pds_doi_service.core.input.pds4_util:_check_for_possible_full_name o_list_contains_full_name_flag (False, ['Deen', ' Robert G.'], 2)
DEBUG pds_doi_service.core.input.pds4_util:_find_method_to_parse_authors o_best_method,pds4_fields_authors (<BestParserMethod.BY_SEMI_COLON: 2>, 'Deen, Robert G.') number_commas,number_semi_colons (1, 0)
DEBUG pds_doi_service.core.input.pds4_util:_find_method_to_parse_authors len(authors_from_comma_split),len(authors_from_semi_colon_split) (2, 1)
DEBUG pds_doi_service.core.util.general_util:create_landing_page_url Creating URL for PDS4 identifier "urn:nasa:pds:mars2020_rover_places::3.0"
DEBUG pds_doi_service.core.util.general_util:create_landing_page_url Created URL "https://pds.nasa.gov/ds-view/pds/viewBundle.jsp?identifier=urn%3Anasa%3Apds%3Amars2020_rover_places&amp;version=3.0"
DEBUG pds_doi_service.core.input.pds4_util:get_names name_list ['Deen, Robert G.']
DEBUG pds_doi_service.core.input.pds4_util:get_names first_last_name_order (-1, 0)
DEBUG pds_doi_service.core.input.pds4_util:_get_name_components parse full_name Deen, Robert G.
DEBUG pds_doi_service.core.input.pds4_util:_get_name_components parsed person {'first_name': 'Robert G.', 'last_name': 'Deen', 'affiliation': [], 'name_type': 'Personal'}
DEBUG pds_doi_service.core.util.keyword_tokenizer:__init__ initialize keyword tokenizer
DEBUG pds_doi_service.core.util.keyword_tokenizer:process_text extract keywords from Mars 2020 Perseverence Rover Mission
DEBUG pds_doi_service.core.util.keyword_tokenizer:process_text new keyword list is {'rover', 'mars', 'perseverence', 'mission', '2020'}
DEBUG pds_doi_service.core.util.keyword_tokenizer:process_text extract keywords from Mars 2020 Perseverence Rover Mars 2020 Perseverence Rover
DEBUG pds_doi_service.core.util.keyword_tokenizer:process_text new keyword list is {'rover', 'mission', '2020', 'perseverence', 'mars'}
DEBUG pds_doi_service.core.util.keyword_tokenizer:process_text extract keywords from Science Derived
DEBUG pds_doi_service.core.util.keyword_tokenizer:process_text new keyword list is {'rover', 'derived', 'mission', '2020', 'perseverence', 'mars', 'science'}
DEBUG pds_doi_service.core.util.keyword_tokenizer:process_text extract keywords from Mars
DEBUG pds_doi_service.core.util.keyword_tokenizer:process_text new keyword list is {'rover', 'derived', 'mission', '2020', 'perseverence', 'mars', 'science'}
DEBUG pds_doi_service.core.util.keyword_tokenizer:process_text extract keywords from Localization (position and orientation) information for the Mars 2020 Perseverence Rover"
DEBUG pds_doi_service.core.util.keyword_tokenizer:process_text new keyword list is {'rover', 'derived', 'mission', '2020', 'localization', 'position', 'perseverence', 'information', 'mars', 'science', 'orientation'}
INFO pds_doi_service.core.db.doi_database:create_connection Connecting to SQLite3 (ver 2.6.0) database /Users/kelly/Documents/Clients/JPL/PDS/Development/nasa-pds/doi-service/venv/doi.db
INFO pds_doi_service.core.db.doi_database:check_if_table_exists Checking for existence of DOI table doi
INFO pds_doi_service.core.db.doi_database:check_if_table_exists Executing query: SELECT count(name) FROM sqlite_master WHERE type='table' AND name='doi'
DEBUG pds_doi_service.core.db.doi_database:check_if_table_exists o_table_exists_flag: False
INFO pds_doi_service.core.db.doi_database:create_table Creating SQLite table "doi"
DEBUG pds_doi_service.core.db.doi_database:query_string_for_table_creation CREATE o_query_string: CREATE TABLE doi (doi TEXT NOT NULL,identifier TEXT,status TEXT NOT NULL,title TEXT,submitter TEXT,type TEXT,subtype TEXT,node_id TEXT NOT NULL,date_added INT,date_updated INT NOT NULL,transaction_key TEXT NOT NULL,is_latest BOOLEAN);
INFO pds_doi_service.core.db.doi_database:create_table Table created successfully
DEBUG pds_doi_service.core.db.doi_database:parse_criteria Calling get_query_criteria_doi with value ['10.17189/btz6-5a82']
DEBUG pds_doi_service.core.db.doi_database:_form_query_with_wildcards WHERE subclause: AND (doi IN (:doi_0) )
DEBUG pds_doi_service.core.db.doi_database:parse_criteria criteria_str: AND (doi IN (:doi_0) )
DEBUG pds_doi_service.core.db.doi_database:parse_criteria dict_entry: {'doi_0': '10.17189/btz6-5a82'}
DEBUG pds_doi_service.core.db.doi_database:select_latest_rows SELECT query_string: SELECT * from doi WHERE is_latest=1 AND (doi IN (:doi_0) ) ORDER BY date_updated
DEBUG pds_doi_service.core.db.doi_database:select_latest_rows Query returned 0 result(s)
DEBUG pds_doi_service.core.db.doi_database:parse_criteria Calling get_query_criteria_title with value ['Mars 2020 Rover PLACES Bundle']
DEBUG pds_doi_service.core.db.doi_database:_form_query_with_wildcards WHERE subclause: AND (title IN (:title_0) )
DEBUG pds_doi_service.core.db.doi_database:parse_criteria criteria_str: AND (title IN (:title_0) )
DEBUG pds_doi_service.core.db.doi_database:parse_criteria dict_entry: {'title_0': 'Mars 2020 Rover PLACES Bundle'}
DEBUG pds_doi_service.core.db.doi_database:select_latest_rows SELECT query_string: SELECT * from doi WHERE is_latest=1 AND (title IN (:title_0) ) ORDER BY date_updated
DEBUG pds_doi_service.core.db.doi_database:select_latest_rows Query returned 0 result(s)
DEBUG pds_doi_service.core.outputs.doi_validator:_check_field_title_content product_type_specific_suffix: Bundle
DEBUG pds_doi_service.core.outputs.doi_validator:_check_field_title_content doi.title: Mars 2020 Rover PLACES Bundle
DEBUG pds_doi_service.core.outputs.doi_validator:_check_field_site_url doi,site_url: 10.17189/btz6-5a82,https://pds.nasa.gov/ds-view/pds/viewBundle.jsp?identifier=urn%3Anasa%3Apds%3Amars2020_rover_places&amp;version=3.0
DEBUG pds_doi_service.core.outputs.doi_validator:_check_field_site_url from_request status_code,site_url: 200,https://pds.nasa.gov/ds-view/pds/viewBundle.jsp?identifier=urn%3Anasa%3Apds%3Amars2020_rover_places&amp;version=3.0
INFO pds_doi_service.core.outputs.doi_validator:_check_field_site_url Landing page URL https://pds.nasa.gov/ds-view/pds/viewBundle.jsp?identifier=urn%3Anasa%3Apds%3Amars2020_rover_places&amp;version=3.0 is reachable
DEBUG pds_doi_service.core.outputs.service:get_doi_record_service Returning instance of DOIDataCiteRecord for service type datacite
INFO pds_doi_service.core.db.doi_database:create_connection Connecting to SQLite3 (ver 2.6.0) database /Users/kelly/Documents/Clients/JPL/PDS/Development/nasa-pds/doi-service/venv/doi.db
INFO pds_doi_service.core.db.doi_database:check_if_table_exists Checking for existence of DOI table doi
INFO pds_doi_service.core.db.doi_database:check_if_table_exists Executing query: SELECT count(name) FROM sqlite_master WHERE type='table' AND name='doi'
DEBUG pds_doi_service.core.db.doi_database:check_if_table_exists o_table_exists_flag: True
DEBUG pds_doi_service.core.db.doi_database:parse_criteria Calling get_query_criteria_doi with value ['10.17189/btz6-5a82']
DEBUG pds_doi_service.core.db.doi_database:_form_query_with_wildcards WHERE subclause: AND (doi IN (:doi_0) )
DEBUG pds_doi_service.core.db.doi_database:parse_criteria criteria_str: AND (doi IN (:doi_0) )
DEBUG pds_doi_service.core.db.doi_database:parse_criteria dict_entry: {'doi_0': '10.17189/btz6-5a82'}
DEBUG pds_doi_service.core.db.doi_database:select_latest_rows SELECT query_string: SELECT * from doi WHERE is_latest=1 AND (doi IN (:doi_0) ) ORDER BY date_updated
DEBUG pds_doi_service.core.db.doi_database:select_latest_rows Query returned 0 result(s)
INFO pds_doi_service.core.db.transaction_on_disk:write Transaction files saved to /Users/kelly/Documents/Clients/JPL/PDS/Development/nasa-pds/doi-service/venv/transaction_history/eng/10.17189/btz6-5a82/2022-07-12T13:28:00.804136+00:00
DEBUG pds_doi_service.core.db.doi_database:query_string_for_is_latest_update UPDATE o_query_string: UPDATE doi SET is_latest = 0 WHERE doi = ?;
DEBUG pds_doi_service.core.db.doi_database:query_string_for_transaction_insert INSERT o_query_string: INSERT INTO doi (doi,identifier,status,title,submitter,type,subtype,node_id,date_added,date_updated,transaction_key,is_latest) VALUES (?,?,?,?,?,?,?,?,?,?,?,?)
{
    "data":
        {
            "id": "10.17189/btz6-5a82",
            "type": "dois",
            "attributes": {
                "doi": "10.17189/btz6-5a82",
                "suffix": "btz6-5a82",
                "identifiers": [
                    {
                        "identifier": "urn:nasa:pds:mars2020_rover_places::3.0",
                        "identifierType": "Site ID"
                    }
                ],
                "creators": [
                    {
                        "nameType": "Personal",
                        "name": "Robert G. Deen",
                        "nameIdentifiers": [
                        ]
                    }
                ],
                "titles": [
                    {
                        "title": "Mars 2020 Rover PLACES Bundle",
                        "lang": "en"
                    }
                ],
                "publisher": "NASA Planetary Data System",
                "publicationYear": "2022",
                "subjects": [
                    { "subject": "2020" },
                    { "subject": "PDS" },
                    { "subject": "PDS4" },
                    { "subject": "derived" },
                    { "subject": "information" },
                    { "subject": "localization" },
                    { "subject": "mars" },
                    { "subject": "mission" },
                    { "subject": "orientation" },
                    { "subject": "perseverence" },
                    { "subject": "position" },
                    { "subject": "rover" },
                    { "subject": "science" }
                ],
                "contributors": [
                    {
                        "nameType": "Personal",
                        "name": "Robert G. Deen",
                        "nameIdentifiers": [
                        ],
                        "contributorType": "Editor"
                    },
                    {
                        "nameType": "Personal",
                        "name": "Grimes Kevin",
                        "nameIdentifiers": [
                        ],
                        "contributorType": "Editor"
                    },
                    {
                        "nameType": "Personal",
                        "name": "Nicholas T. Toole",
                        "nameIdentifiers": [
                        ],
                        "contributorType": "Editor"
                    },
                    {
                        "nameType": "Organizational",
                        "name": "Planetary Data System: Engineering Node",
                        "contributorType": "DataCurator"
                    }
                ],
                "types": {
                    "resourceTypeGeneral": "Collection",
                    "resourceType": "PDS4 Refereed Data Bundle"
                },
                "relatedIdentifiers": [
                ],
                "descriptions": [
                    {
                        "description": "Localization (position and orientation) information for the Mars 2020 Perseverence Rover\"",
                        "descriptionType": "Abstract",
                        "lang": "en"
                    }
                ],
                "url": "https://pds.nasa.gov/ds-view/pds/viewBundle.jsp?identifier=urn%3Anasa%3Apds%3Amars2020_rover_places&amp;version=3.0",
                "created": "2022-07-12T13:28:00.804136Z",
                "updated": "2022-07-12T13:28:00.804136Z",
                "state": "review",
                "language": "en",
                "schemaVersion": "http://datacite.org/schema/kernel-4"
            }
        }
}
$ echo $?
0
$ echo \U+1F60E
😎
tloubrieu-jpl commented 2 years ago

@gxtchen , the fix is available in an unstable version of the package, you should deploy it with this command:

pip install https://github.com/NASA-PDS/doi-service/releases/download/v2.2.1-dev/pds_doi_service-2.2.1.dev0-py3-none-any.whl

Is that what you used ?

Thanks

gxtchen commented 2 years ago

@tloubrieu-jpl @nutjob4life Thanks, it works now. The error was cause by using a older version of python 3.9

tloubrieu-jpl commented 2 years ago

Perfect thanks @gxtchen