NASA-PDS / validate

Validates PDS4 product labels, data and PDS3 Volumes
https://nasa-pds.github.io/validate/
Apache License 2.0
16 stars 11 forks source link

As a user, I want to validate that all context objects specified in observational products are referenced in the parent bundle/collection Reference_List #69

Open jordanpadams opened 5 years ago

jordanpadams commented 5 years ago

Motivation

...so that we can enable quality search results when searching at the collection/bundle level

Additional Details

Reverted per https://github.com/NASA-PDS/validate/pull/456

All unique context objects specified in observational products must be referenced in the Reference_List of the parent collection and bundle. These context objects are referenced from:

Acceptance Criteria

Given a bundle with target identified as X When I perform validation of the bundle (-R pds4.bundle) with products that have targets X and Y Then I expect validate to throw a WARNING that all targets are not specified in the parent bundle

Given a collection J with target identified as X When I perform validation of the bundle containing collection J (-R pds4.bundle) with products that have targets X and Y Then I expect validate to throw a WARNING that all targets are not specified in the parent collection

Given a collection J with target identified as X When I perform validation of the bundle containing collection J (-R pds4.bundle) with products that have targets X and Y with --skip-context-reference-check flag Then I expect validate to NOT throw a WARNING

Engineering Details

Some background: Per email chain with @mitchgordon, @lynnneakrase, @rsjoyner, @rchenatjpl , the issue arose regarding specifying numerous targets within a data collection. There were several other alternatives, including specifying the targets in the context collection and specifying a planetary_system instead of the individual targets, however, it was determined the best way to specify targets at the collection/bundle level is to explicitly add all targets to the bundle/collection label Reference_List. This solution should apply across all context objects.

jordanpadams commented 5 years ago

email_20190815.pdf

msbentley commented 3 years ago

I haven't fully digested the email discussion, but just to summarise - the intention is that any target which appears in an observational product should be replicated in the parent collection and/or bundle reference list? For other types of contexts products this makes sense (investigation, host etc.) but for targets it could get... messy.

Currently I have been assuming that in the PSA we would curate the bundle label reference lists for the primary mission target(s), and not for every single target that we may have observed during a long cruise, or calibration targets etc.

I'm not again it, per se, but because we dynamically update our bundle and collection labels with every product ingestion, this would take some database work to implement etc.

jordanpadams commented 3 years ago

@msbentley the primary search scenario here is someone trying to "browse" the archive for any bundle / collection that may contain products that looked at target "X" (e.g. a browsing feature of some kind). some may be serendipitous, during a long cruise, etc., but we can't say for sure that could not influence the science. I imagine this will be a WARNING message because of the scenario you are mentioning. But we wanted to at least make people aware in the event they intended to include all targets in the bundle/collection.

thoughts?

msbentley commented 3 years ago

Yeah, I can kinda see that, but I guess I'm assuming that most people would hit a search enging/the registry and look for products with a given target. But I guess if they're coming through Google Dataset Search or something that's bringing them in at bundle or collection level, then it would be useful. If it's warning only, fine with me!

qchaupds commented 3 years ago

Is there a test resources for this issue?

qchaupds commented 3 years ago

@jordanpadams Is there a test resources for this ticket or should I try to come up with one, although I am NOT even sure where to start. It would help to have concrete examples to work from.

jordanpadams commented 3 years ago

further engineering details:

Going back to the diagram here: https://github.com/NASA-PDS/registry-api/issues/458:

pds4_product_hierarchy

Ignoring the arrows on the right side of this diagram for a moment, for referential integrity checking purposes, validate already checks that “What collections belong to this bundle” and “What products belong to this collection”. Those classes already exist and have been successfully performing the referential integrity checking.

What we are saying for this ticket is, starting from the bottom of the tree, all context objects referred to in Products I, J, and K should be in Collection X, all context objects in Product L should be in Collection Y, etc.

Going up the tree, all context objects referred to in Collections X, Y, and Z should be referenced in Bundle A.

rchenatjpl commented 3 years ago

This sounds good. I hadn't thought to check this before.

tloubrieu-jpl commented 3 years ago

We consider the missing references discussed here will always raise WARNING.

rchenatjpl commented 2 years ago

Is this a fail? product_observational references a target not referenced in bundle or collection. validate flags the bundle but not the collection. Much more subtle: the collection references a target that none of its products reference. val69.zip

rchenatjpl commented 2 years ago

It's probably a fail. The revised test has these lid_references to targets: bundle: saturn narvi collection: saturn titan data: saturn narvi

% validate -R pds4.bundle -t val69b

PDS Validate Tool Report

Configuration: Version 2.2.0-SNAPSHOT Date 2021-10-25T23:45:19Z

Parameters: Targets [file:/Users/rchen/Desktop/val69b/] Rule Type pds4.bundle Severity Level WARNING Recurse Directories true File Filters Used [.xml, .XML] Data Content Validation on Product Level Validation on Allow Unlabeled Files false Max Errors 100000 Registered Contexts File /Users/rchen/PDS4tools/validate/resources/registered_context_products.json

Product Level Validation Results

PASS: file:/Users/rchen/Desktop/val69b/bundle-vg1-sat-pos-l1coords-1.0.xml 1 product validation(s) completed

PASS: file:/Users/rchen/Desktop/val69b/data-sedr/SEDR_L1.xml 2 product validation(s) completed

PASS: file:/Users/rchen/Desktop/val69b/data-sedr/collection-data-sedr-1.0.xml 3 product validation(s) completed

PDS4 Bundle Level Validation Results

PASS: file:/Users/rchen/Desktop/val69b/data-sedr/collection-data-sedr-1.0.xml 1 integrity check(s) completed

PASS: file:/Users/rchen/Desktop/val69b/bundle-vg1-sat-pos-l1coords-1.0.xml WARNING [warning.integrity.missing_context_reference] This file should reference 'urn:nasa:pds:context:target:satellite.saturn.narvi' because its child product with LIDVID urn:nasa:pds:vg1-saturn-pos-l1coords:data-sedr:sedr-l1::1.0 references it. WARNING [warning.integrity.missing_context_reference] This file should reference 'urn:nasa:pds:context:target:satellite.saturn.titan' because its child product with LIDVID urn:nasa:pds:vg1-saturn-pos-l1coords:data-sedr::1.0 references it. 2 integrity check(s) completed

PASS: file:/Users/rchen/Desktop/val69b/data-sedr/SEDR_L1.xml 3 integrity check(s) completed

Summary:

0 error(s) 2 warning(s)

Product Validation Summary: 3 product(s) passed 0 product(s) failed 0 product(s) skipped

Referential Integrity Check Summary: 3 check(s) passed 0 check(s) failed 0 check(s) skipped

Message Types: 2 warning.integrity.missing_context_reference

End of Report Completed execution in 4662 ms

rchenatjpl commented 2 years ago

val69b.zip

rchenatjpl commented 2 years ago

@qchaupds @jordanpadams Another probable point of failure: context products all have LIDs urn:::context:..., i.e. look for "context". The attached should generate no warnings or errors.

% validate -R pds4.bundle -t val308a PDS Validate Tool Report Configuration: Version 2.2.0-SNAPSHOT Date 2021-10-26T02:33:46Z Parameters: Targets [file:/Users/rchen/Desktop/test/val308a/] Rule Type pds4.bundle Severity Level WARNING Recurse Directories true File Filters Used [.xml, .XML] Data Content Validation on Product Level Validation on Allow Unlabeled Files false Max Errors 100000 Registered Contexts File /Users/rchen/PDS4tools/validate/resources/registered_context_products.json Product Level Validation Results PASS: file:/Users/rchen/Desktop/test/val308a/bundle-voyager1-pls-sat-1.0.xml 1 product validation(s) completed PASS: file:/Users/rchen/Desktop/test/val308a/browse-ion-moments/collection-browse-ion-moments-1.0.xml 2 product validation(s) completed PASS: file:/Users/rchen/Desktop/test/val308a/browse-ion-moments/ION_MOM.xml 3 product validation(s) completed PASS: file:/Users/rchen/Desktop/test/val308a/data-ion-moments-96sec/collection-data-ion-moments-96s-1.0.xml 4 product validation(s) completed PASS: file:/Users/rchen/Desktop/test/val308a/data-ion-moments-96sec/ION_MOM.xml 5 product validation(s) completed PDS4 Bundle Level Validation Results PASS: file:/Users/rchen/Desktop/test/val308a/browse-ion-moments/collection-browse-ion-moments-1.0.xml 1 integrity check(s) completed PASS: file:/Users/rchen/Desktop/test/val308a/bundle-voyager1-pls-sat-1.0.xml WARNING [warning.integrity.missing_context_reference] This file should reference 'urn:nasa:pds:vg1-pls-sat:data-ion-moments-96sec:ion-mom' because its child product with LIDVID urn:nasa:pds:vg1-pls-sat:browse-ion-moments:ion-mom::1.0 references it. WARNING [warning.integrity.missing_context_reference] This file should reference 'urn:nasa:pds:vg1-pls-sat:browse-ion-moments:ion-mom' because its child product with LIDVID urn:nasa:pds:vg1-pls-sat:data-ion-moments-96sec:ion-mom::1.0 references it. 2 integrity check(s) completed PASS: file:/Users/rchen/Desktop/test/val308a/data-ion-moments-96sec/collection-data-ion-moments-96s-1.0.xml 3 integrity check(s) completed PASS: file:/Users/rchen/Desktop/test/val308a/data-ion-moments-96sec/ION_MOM.xml 4 integrity check(s) completed PASS: file:/Users/rchen/Desktop/test/val308a/browse-ion-moments/ION_MOM.xml 5 integrity check(s) completed Summary: 0 error(s) 2 warning(s) Product Validation Summary: 5 product(s) passed 0 product(s) failed 0 product(s) skipped Referential Integrity Check Summary: 5 check(s) passed 0 check(s) failed 0 check(s) skipped Message Types: 2 warning.integrity.missing_context_reference End of Report Completed execution in 6189 ms val308a.zip

jordanpadams commented 2 years ago

@rchenatjpl created a new ticket to track this at #430