NASA-PDS / doi-service

Service and tools for generating DOIs for PDS bundles, collections, and data sets
https://nasa-pds.github.io/doi-service
Other
2 stars 3 forks source link

DOI-service application inaccurately reports LID as being invalid #336

Closed rsjoyner closed 2 years ago

rsjoyner commented 2 years ago

🐛 Describe the bug

The DOI-service s/w inaccurately identified this LID as being invalid – but I think not….

<logical_identifier>urn:nasa:pds:saturn_thermosphere_h2_density_temp</logical_identifier>
InvalidIdentifierException : The record identifier urn:nasa:pds:saturn_thermosphere_h2_density_temp::1.0 (DOI None) does not conform to a valid LIDVID format.

Reason: LIDVID field 4 (saturn_thermosphere_h2_density_temp) is invalid. Fields must begin with a letter or digit, and only consist of letters, digits, hyphens (-), underscores (_) or periods (.)

The DOI was successfully minted…

📜 To Reproduce

Steps to reproduce the behavior: mkdir ATM/Bundles_20220606

scp *.xml rsjoyner@pds-prod1:/home/pds4/input/ATM/Bundles_20220606
pds-doi-cmd release -N atm -s rsjoyner@jpl.nasa.gov -i /home/pds4/input/ATM/Bundles_20220606/ --force --no-review > /home/pds4/result_activate_no_review_ATMOS_Bundles_20220606.json

scp rsjoyner@pds-prod1:/home/pds4/result_activate_no_review_ATMOS_Bundles_20220606.json result_activate_no_review_ATMOS_Bundles_20220606.json
  1. See error

🕵️ Expected behavior

The expected behavior is that the application does not inaccurately report the valid LID as being an invalid LIDVID

📚 Version of Software Used

🩺 Test Data / Additional context

see enclosure zip file

🏞Screenshots

🖥 System Info

ATMOS_Bundles_20220606.zip


🦄 Related requirements

⚙️ Engineering Details

alexdunnjpl commented 2 years ago

The doi-service LID field validation regex pattern [a-z0-9][a-z0-9-._]{0,31} imposes length constraint of [1,32]

value 'saturn_thermosphere_h2_density_temp' has length 35

The PDS4 Standards Reference Sec. 6D.2 does not appear to impose any constraint on the length of any LID field, but Sec 6D does impose a length constraint of 255 characters total on any identifier (ie LID or LIDVID)

The doi-service pattern also applies the constraint that the first field character must be alphanumeric, though the SR does not impose such a constraint.

Suggest

@jordanpadams just wanna run this by you first, as I might be missing some context

alexdunnjpl commented 2 years ago

Have implemented suggested fixes, but will hold off on modifying tests and opening a PR until Jordan has confirmed.

alexdunnjpl commented 2 years ago

There's a test which is now failing due to removal of the requirement that the first character of each chunk be alphanumeric. Seems like there should be a check that the first three chunks are ['urn', 'nasa', 'pds']?

# Test invalid field tokens (invalid characters)
  doi_obj.pds_identifier = "urn:nasa:_pds:lab_shocked_feldspars"

  with self.assertRaises(InvalidIdentifierException):
      self._doi_validator._check_lidvid_field(doi_obj)
jordanpadams commented 2 years ago

@alexdunnjpl

Have implemented suggested fixes, but will hold off on modifying tests and opening a PR until Jordan has confirmed.

this looks great. you are go for PR 🚀

Seems like there should be a check that the first three chunks are ['urn', 'nasa', 'pds']?

I like it. let's add this check as well.