NASA-PDS / registry-mgr

Standalone Registry Manager application responsible for managing the PDS Registry (https://github.com/NASA-PDS/registry) schemas and indexes.
https://nasa-pds.github.io/registry
Other
0 stars 2 forks source link

Load-data command doesn't report Elasticsearch errors #25

Closed tdddblog closed 3 years ago

tdddblog commented 3 years ago

πŸ› Describe the bug

Registry manager should report Elasticsearch API errors.

πŸ“œ To Reproduce

Steps to reproduce the behavior:

  1. Harvest a label with invalid / unsupported date field, e.g., <cassini:earth_received_start_time>1999-010T05:44:46.821</cassini:earth_received_start_time> NOTE: This date could not be loaded into Elasticsearch without conversion to "ISO instant" format. Harvest does the conversion, but only for fields which have "date" in their names.
  2. Make sure that cassini:earth_received_start_time has data type date in Elasticsearch data dictionary index (registry-dd)
  3. Load harvested data with registry manager
  4. There are no errors in registry manager output, but the document is not loaded into Elasticsearch.

πŸ•΅οΈ Expected behavior

Registry manager should report an error similar to this:

failed to parse field [cassini:VIMS_Specific_Attributes/cassini:earth_received_start_time] 
of type [date] in document with id 'urn:nasa:pds:cassini_vims_cruise:data_raw:1294638283::1.0'. 
Preview of field's value: '1999-010T05:44:46.821'

🩺 Test Data / Additional context

1294638283.zip

We have to figure out how to fix Harvest. Probably introduce a "list of date fields" option in Harvest configuration file. The reason "time" fields are not automatically converted is because there are a lot of different mission specific formats, like

<ladee:integration_time>15</ladee:integration_time>
<ladee:raw_timestamp>28563724162614</ladee:raw_timestamp>
<ladee:uvs_timestamp>46330</ladee:uvs_timestamp>

:unicorn: Applicable requirements

tloubrieu-jpl commented 3 years ago

For now, @tdddblog will do a quick fix in harvest configuration.

For a more sustainable solution, a ticket will be created. We'll use the PDS4 schema information to know the type of the fields.