NASA-PDS / harvest

Standalone Harvest client application providing the functionality for capturing and indexing product metadata into the PDS Registry system (https://github.com/nasa-pds/registry).
https://nasa-pds.github.io/registry
Other
4 stars 3 forks source link

Unknown date format used that could not be parsed by Harvest #197

Closed tloubrieu-jpl closed 2 weeks ago

tloubrieu-jpl commented 1 month ago

Checked for duplicates

No - I haven't checked

🐛 Describe the bug

Could not parse LDD date. Line 2312 [WARN] Could not parse LDD date 2023-01-20T10:16:50 [ERROR] Could not parse date from 2023-01-20T10:16:50 using patterns defined in LddUtils.Accepted_LDD_DateFormats

Full log here: harvestExample3.txt.zip

🕵️ Expected behavior

~We should first check if the proposed date should be supported or if it needs to be updated in the LDD file.~

Harvest should load the datetime, even if it is invalid. (per update from @jordanpadams)

📜 To Reproduce

1. 2. 3. ...

🖥 Environment Info

📚 Version of Software Used

No response

🩺 Test Data / Additional context

No response

🦄 Related requirements

🦄 #xyz

⚙️ Engineering Details

No response

🎉 Integration & Test

No response

jordanpadams commented 1 month ago

@tloubrieu-jpl do we have a specific label / product LID we can check here? I can run validate to determine if the date time is valid or not.

jordanpadams commented 1 month ago

@tloubrieu-jpl regardless, harvest should support invalid datetimes as well.

al-niessner commented 1 month ago

@jordanpadams @tloubrieu-jpl

Um, I do not see how we can support invalid date times. I think I know what you mean but "ides of march" is hard to convert and is invalid. I think you really mean we need to expand our acceptable list of formats to include this one.

Each of the patterns that are specified for LDD date time use a time zone. In fact, if this date time had a time zone then it would have worked. Do you want to add a format that specifies no time zone then make the time zone UTC?

jordanpadams commented 1 month ago

@al-niessner correct. we need to expand our acceptable list. here are the patterns we need to support:

https://pds.nasa.gov/datastandards/documents/im/v1/index_1M00.html#19.5%C2%A0%C2%A0class_pds_ascii_date https://pds.nasa.gov/datastandards/documents/im/v1/index_1M00.html#19.7%C2%A0%C2%A0class_pds_ascii_date_time https://pds.nasa.gov/datastandards/documents/im/v1/index_1M00.html#19.8%C2%A0%C2%A0class_pds_ascii_date_time_doy https://pds.nasa.gov/datastandards/documents/im/v1/index_1M00.html#19.9%C2%A0%C2%A0class_pds_ascii_date_time_doy_utc https://pds.nasa.gov/datastandards/documents/im/v1/index_1M00.html#19.10%C2%A0%C2%A0class_pds_ascii_date_time_utc https://pds.nasa.gov/datastandards/documents/im/v1/index_1M00.html#19.11%C2%A0%C2%A0class_pds_ascii_date_time_ymd https://pds.nasa.gov/datastandards/documents/im/v1/index_1M00.html#19.12%C2%A0%C2%A0class_pds_ascii_date_time_ymd_utc https://pds.nasa.gov/datastandards/documents/im/v1/index_1M00.html#19.13%C2%A0%C2%A0class_pds_ascii_date_ymd

al-niessner commented 1 month ago

These are the same as validate (I was just looking at them): https://github.com/NASA-PDS/validate/blob/e728839809fcecca88ffe2fdb32becc0a4d73e28/src/main/java/gov/nasa/pds/tools/validate/rule/pds4/DateTimeValidator.java#L44-L61

Do you want me to take more time and move them to registry-common and have validate depend on common or just copy-n-paste?

tloubrieu-jpl commented 3 weeks ago

One example of invalid date found by Dan Scholes is in the LDD https://pds.nasa.gov/pds4/cart/v1/PDS4_CART_1I00_1960.JSON

The value is "2022-06-02T12:06:39"

The referencing label is https://pds-geosciences.wustl.edu/grail/grail-l-lgrs-5-rdr-v1/grail_1001/rsdmap/gggrx_1200a_anomerr_l180.xml

tloubrieu-jpl commented 3 weeks ago

@al-niessner is re-using validate code to parse the date time.

al-niessner commented 3 weeks ago

@jordanpadams @tloubrieu-jpl

Because of how the date time from the LDD is used, it needs a date and a time despite the regex from the models do not require it. Hence you will still get errors if you do something like 1999-102 but 1999-102T13:14:15 works and is the shortest that will work. Same with YMD. Must be at least 1999-04-23T13:14:15. Cannot be briefer.

Correction: all you need is hours. Rest can be blank: 1999-04-23T13 is sufficient as is 1999-102T13.

Also, all times are forced to UTC.