NASA-PDS / validate

Validates PDS4 product labels, data and PDS3 Volumes
https://nasa-pds.github.io/validate/
Apache License 2.0
16 stars 11 forks source link

As a user, I want to throw an WARNING when `Time_Coordinates.start_time` does not match or come before `Time_Coordinates.stop_time` #964

Open jstone-psi opened 1 month ago

jstone-psi commented 1 month ago

Checked for duplicates

Yes - I've already checked

🧑‍🔬 User Persona(s)

Data Archivist, Data Provider

💪 Motivation

...so that I can ensure the start time comes before the stop time.

📖 Additional Details

From original bug ticket: from @jstone-psi:

I was asked by one of our archivists if validate ensures that start_date_time occurs before stop_date_time. I tested this, and found that it does not enforce it.

I expected either an error or a warning if the stop_date_time occurred before the start_date_time.

There are cases where this may be difficult due to precision mismatches in both dates. In that case I would expect a warning regarding the difference in precision, or an attempt to compare them after accounting for the difference.

Acceptance Criteria

Given a label with a start_time > stop_time When I perform label validation Then I expect validate to raise an WARNING

⚙️ Engineering Details

matthewtiscareno commented 1 month ago

Do you mean if the stop time attribute appears in the label before the start time attribute? Or do you mean if the stop time is later in time than the start time?

jordanpadams commented 1 month ago

@jstone-psi per @matthewtiscareno comment, if it is the latter, this opens a bit of a can of worms. Per this comment:

There are cases where this may be difficult due to precision mismatches in both dates. In that case I would expect a warning regarding the difference in precision, or an attempt to compare them after accounting for the difference.

The problem here is there is nowhere in the standard that requires the precision be the same, we can add that check, but we will also need to add a flag to disable it since some people may not care about that and/or past data sets didn't take that into account.

We will need a set of test cases from the nodes in order to accurately test this (different time formats, different time precisions, invalid times, etc.). I could imagine we implement this and then get 5 bugs submitted because of corner cases we were unaware of.

jstone-psi commented 1 month ago

@matthewtiscareno @jordanpadams I did mean the latter.

I understand this is a can of worms, and that there are a lot of corner cases that we would have to deal with. This would still be valuable to have in the general case, when everything matches up correctly. (I'd like to believe that this would happen most of the time).

What would be the next step for this? Somewhere like DDWG or SWG, perhaps?

jordanpadams commented 1 month ago

@jstone-psi I will add this to our next SWG agenda. If you wouldn't mind speaking to it, and then we can ask the nodes for some of those corner cases.

jstone-psi commented 1 month ago

Understood. I will do my best to be there.

jordanpadams commented 1 month ago

@jstone-psi FYI, I revised this ticket to be a new requirement vs. a bug fix

jstone-psi commented 1 month ago

Here are some notes that I prepared for the discussion: PDS4DateOrderValidation.md

jstone-psi commented 1 month ago

Here are some examples I found in our archive:

differing precisions:

https://sbnarchive.psi.edu/pds4/non_mission/compil.ast.radar.shape-models/bundle_compil.ast.radar.shape-models.xml https://sbnarchive.psi.edu/pds4/non_mission/compil.tno-centaur.lightcurves/bundle_compil.tno-centaur.lightcurves.xml https://sbnarchive.psi.edu/pds4/non_mission/gaskell.ast-eros.shape-model/bundle_gaskell.ast-eros.shape-model.xml https://sbnarchive.psi.edu/pds4/non_mission/gaskell.ast-eros.shape-model_V1_1/bundle_gaskell.ast-eros.shape-model.xml https://sbnarchive.psi.edu/pds4/non_mission/gbo.ast-1999td10.images-lightcurves/bundle_gbo.ast-1999td10.images-lightcurves.xml https://sbnarchive.psi.edu/pds4/non_mission/gbo.ast-1999td10.images-lightcurves_V2_0/bundle_gbo.ast-1999td10.images-lightcurves.xml https://sbnarchive.psi.edu/pds4/non_mission/gbo.ast-mb.reddy.spectra/bundle_gbo.ast-mb.reddy.spectra.xml https://sbnarchive.psi.edu/pds4/non_mission/gbo.ast-neo.ondrejov.lightcurves/bundle_gbo.ast-neo.ondrejov.lightcurves.xml https://sbnarchive.psi.edu/pds4/non_mission/gbo.ast.chamberlain.sub-mm-lightcurves/bundle_gbo.ast.chamberlain.sub-mm-lightcurves.xml https://sbnarchive.psi.edu/pds4/non_mission/gbo.ast.fieber-beyer.spectra/bundle_gbo.ast.fieber-beyer.spectra.xml https://sbnarchive.psi.edu/pds4/non_mission/gbo.ast.fieber-beyer.spectra_V2_0/bundle_gbo.ast.fieber-beyer.spectra.xml https://sbnarchive.psi.edu/pds4/non_mission/gbo.ast.primass-l.spectra_V1_0/bundle_gbo.ast.primass-l.spectra.xml https://sbnarchive.psi.edu/pds4/non_mission/hst.ast-ceres.images-albedo-shape_V1_0/bundle_hst.ast-ceres.images-albedo-shape.xml https://sbnarchive.psi.edu/pds4/non_mission/ast-sat.thomas.shape-models_V1_0/bundle_ast-sat.thomas.shape-models.xml https://sbnarchive.psi.edu/pds4/non_mission/ast_spectra_reddy_neos_marscrossers_V1_0/bundle_ast_spectra_reddy_neos_marscrossers.xml

nil start date

https://sbnarchive.psi.edu/pds4/non_mission/gbo.ast.24-color-survey/bundle_gbo.ast.24-color-survey.xml

nil stop date

https://sbnarchive.psi.edu/pds4/non_mission/ast-high-inclination.gil-hutton.families/bundle_ast-high-inclination.gil-hutton.families.xml

jstone-psi commented 1 month ago

Here's one that incudes negative dates. It looks like this is from an earlier version of PDS that didn't support them, so the start date in the label is 0. All of the dates in this product are precise only to the year, so we won't be able to exercise comparing a negative date with different precision.

https://pdssbn.astro.umd.edu/holdings/pds4-compil-comet:phys_char-v1.0/data/

matthewtiscareno commented 2 weeks ago

@jstone-psi: I only spot-checked your examples, but I didn't find any where start_date_time was later than stop_date_time. Is there any reason to worry about the precision mismatch in these cases?

It seems to me that there should be an error (not a warning) if start_date_time is later than stop_date_time. I can't imagine a precision mismatch being an excuse for this, though I'm interested in any counter-examples that someone might offer. On the other hand, a precision issue could be a good reason for start_date_time to be equal to stop_date_time.

Your original proposal was simply to throw an error or a warning (I favor the former) if start_date_time is later than stop_date_time. Why not simply go forward with that and not worry so much about precision issues?

matthewtiscareno commented 2 weeks ago

By the way, your two examples of nil start_date_time or nil stop_date_time are troubling and perhaps should be fixed if not made into a validation error.

jstone-psi commented 2 weeks ago

@matthewtiscareno: The examples I posted were only to illustrate real-world labels where the precision was different (primarily for the sake of developing test cases). If I understood the issue from our archiving team correctly, then the instances where the start_date_time was later was data that was still being developed.

I agree that situations where start_date_time is later than stop_date time should be warnings/errors. The problem is that when the precision doesn't match, it becomes difficult to say definitively whether it actually is later. The example that I used in the notes above was 2023 vs 2023-12-31. Is 2023 earlier or later? It's hard to say. And this is why the issue isn't straightforward.

I would agree about the nil start_date_time or stop_date_time alone being an issue. I think that we've been ok with it when they are both nil for derived data.

matthewtiscareno commented 2 weeks ago

@jstone-psi: I suggest that such a test be implemented by simply reducing the more precise value to the precision of the less precise value. If they are equal (as in your example), then it's no problem.

So is the conclusion that Validate does not test for start time later than stop time (and probably should), but that no actual examples of such a thing are currently known?

matthewtiscareno commented 2 weeks ago

I see from recent Software WG notes that you are also interested in discussing what to do more generally when there is a precision mismatch. That's a more nuanced topic, perhaps. Should it be a separate issue?

jstone-psi commented 2 weeks ago

@matthewtiscareno