NASA-PDS / validate

Validates PDS4 product labels, data and PDS3 Volumes
https://nasa-pds.github.io/validate/
Apache License 2.0
16 stars 11 forks source link

As a user, I want to be able to use both online and local schema/schematron files. #599

Closed tbarnes4 closed 1 year ago

tbarnes4 commented 1 year ago

Checked for duplicates

Yes - I've already checked

πŸ§‘β€πŸ”¬ User Persona(s)

Anyone that validates a file/collection/bundle against an updated or new schema/schematron that has not been ingested online, while your file/collection/bundle also uses other schema/schematrons that are ingested online.

πŸ’ͺ Motivation

So that when I invoke the -x or -S options, I do not have to locally specify every single schema/schematron that a file/bundle/collection references when only one or two are not available online. It is frustrating when I validate a bundle, the tool gives errors for schema that are easily found online.

πŸ“– Additional Details

This may relate to issue #513.

Whenever we have to update a schema/schematron file or if a mission provides a new ldd or if we are testing a new/updated ldd, if the pds4 product label references other schema/schematron files that are completely valid, when I finally run the validate tool and invoke the option -x or -S, the validate tool requires me to specify all schema/schematron files that may be referenced throughout my label/bundle/collection. This means I have to track down every single reference before, or specify every single possible schema/schematron file. These options seem to make an all or nothing scenario instead of here are the missing or updated schema/schematrons needed to include in the list.

I would ask that when the -x or -S options are invoked, that the validate tool first checks the files specified by the -x or -S options, and then checks the online posted copies, and then if nothing is found, it should report errors as it does.

I will also note that when you do not invoke the -x or -S options, and validate finds a schema/schematron in the label it cannot find, it reports a WARNING schema_reference.4: Failed to read schema document and ERROR cvc-complex-type.2.4.c: The matching wildcard is strict, but no declaration can be found for element, whereas if you do invoke the -x or -S options and do not specify all files, it will instead not give warning of missing schema, but will give similar errors as before, but for the online schemas (ex: ERROR [error.label.schema] line XYZ, AB: cvc-complex-type.2.4.c: The matching wildcard is strict, but no declaration can be found for element 'disp:Display_Settings'. Only when you specify both online missing schemas and the online schemas, do you get no errors or warnings.

Acceptance Criteria

Given When I perform Then I expect

βš™οΈ Engineering Details

No response

jordanpadams commented 1 year ago

@tbarnes4 we will triage this to see where it fits in priority for next build. In the interim, you can find all schemas and schematrons here: https://pds.nasa.gov/datastandards/dictionaries/index-1.19.0.0.shtml

Sorry for the inconvenience

al-niessner commented 1 year ago

@jordanpadams using the example in #513 to fix this as well

al-niessner commented 1 year ago

@jordanpadams

Using a butchered version of the data on #513 because it is so simple but having some problems. The switches -S and -x clearly want files from their description via the command line:

-S,--schematron <schematron files>      Specify schematron files.
-x,--schema <schema files>              Specify schema files.

However, the code is having fits because it wants a directory:

          validatingSchema = schemaFactory.newSchema(
              loadSchemaSources(VersionInfo.getSchemasFromDirectory().toArray(new String[0]))
                  .toArray(new StreamSource[0]));

If we are allowing content in the XML to define schema and schematron and have files override specific schema and schematron, then files make more sense. If we want to retain that all schema and/or schematron must be overridden then directories make more sense. From the tickets here it seems the desire is for files (selective overrides not all or nothing overrides).

So, do you want me to change:

  1. code to be files not directories (keep all or nothing)
  2. command line docs to be directories instead of files (keep all or nothing)
  3. code to be files not directories (override just what is given and use XML definitions for rest)
  4. command line docs to be directories (override just what is given and use XML definitions for rest)
  5. check arg for file or dir then process as 1/2 or 3/4,
al-niessner commented 1 year ago

@jordanpadams

I should note that this is going to take a week of effort as the whole schema/schematron loading is going to need some rework. If the encoding stuff is more urgent let me know but it seems like there is enough time for both.

jordanpadams commented 1 year ago

@al-niessner nope. this is fine. thanks

jordanpadams commented 1 year ago

@al-niessner #3 is preferred solution:

  1. code to be files not directories (override just what is given and use XML definitions for rest)
tbarnes4 commented 1 year ago

@al-niessner @jordanpadams

@al-niessner #3 is preferred solution:

  1. code to be files not directories (override just what is given and use XML definitions for rest)

I think option 5 would be preferred. As I understand it, we currently have to specify each file (not a directory) when we use the -x or -S options. Having the option to specify a directory (perhaps in addition to, but not excluding, individual file calls) would be nice, but not required. This may be too complicated, and usually the list of changed/new schema/schematron files should be small. If I understand option 3 correctly, that should work well for us.

Thanks for adding this functionality. It will greatly help our node with easing our migration efforts and upcoming missions.

I can also foresee when we are versioning bundles/collections that we will not update certain product labels (or whole collections), and so multiple build versions may be called upon for different products. Having that capability will still be nice. When I validating last week, I noticed if I include a specific build with the -x and -S that it would exclude all other builds and suggest an update build XYZ for the other products that contain a different build. This don't believe that this happens if I don't specify a build with the -x and -S option calls.

jordanpadams commented 1 year ago

I noticed if I include a specific build with the -x and -S that it would exclude all other builds

@tdbarnes4 that is actually intentional to "overwrite" the schemas/schematrons specified in the file so you could validate your products against the latest version of the PDS4 IM. Is there a specific reason why you are specifying the schemas via command-line instead of just pulling the online version?

tbarnes4 commented 1 year ago

@jordanpadams It was actually an accident, and I saw the results of doing it for the first time. Normally I'd never do that. I see how it would be better as you state, if you wanted to force check all files against a single build. Comment withdrawn.

On Tue, Apr 18, 2023 at 10:53β€―AM Jordan Padams @.***> wrote:

I noticed if I include a specific build with the -x and -S that it would exclude all other builds

@tdbarnes4 that is actually intentional to "overwrite" the schemas/schematrons specified in the file so you could validate your products against the latest version of the PDS4 IM. Is there a specific reason why you are specifying the schemas via command-line instead of just pulling the online version?

β€” Reply to this email directly, view it on GitHub https://github.com/NASA-PDS/validate/issues/599#issuecomment-1513297802, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASOP3NAXVD7OESQEOXVWXLDXB2TINANCNFSM6AAAAAAVG4JBUQ . You are receiving this because you were mentioned.Message ID: @.***>

al-niessner commented 1 year ago

@jordanpadams

Sorry, but it just got more complicated. The error is because the directories come from core.properties file while the direction to use the values in the properties file comes from the command line in not using the force option which is turned off automatically when -S or -x is given.

So, what about the properties file? Kill it with respect to the schema/schematron or kill the command line? They do seem at odds or, at the least, need explanation for anyone intending on using the -S or -x and the interaction among all of the options. I can write the explanation if you tell me the desired interaction.

jordanpadams commented 1 year ago

@al-niessner can you direct me to where in the code it is actually getting a directory and/or what it is trying to do? I am not seeing anything that makes sense:

xml.version=1.0
library.version=1.14.0
pds.version=4.0
pds.default.namespace=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1B00=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1A10=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1A00=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1900=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1800=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1700=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1600=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1500=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1400=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1301=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1300=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1201=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1200=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1101=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1100=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1000=http://pds.nasa.gov/pds4/pds/v1

core.schematron.namespace=http://purl.oclc.org/dsdl/schematron

core.copyright=\nCopyright 2010-2020, by the California Institute of Technology.\nALL RIGHTS RESERVED. United States Government Sponsorship acknowledged.\nAny commercial use must be negotiated with the Office of Technology Transfer\nat the California Institute of Technology.\n\nThis software is subject to U. S. export control laws and regulations\n(22 C.F.R. 120-130 and 15 C.F.R. 730-774). To the extent that the software\nis subject to U.S. export control laws and regulations, the recipient has\nthe responsibility to obtain export licenses or other export authority as\nmay be required before exporting such information to foreign countries or\nproviding access to foreign nationals.

Regardless, I think we can probably skip looking in that file for the schema/schematron.

al-niessner commented 1 year ago

Here is the code that load the core.properties:

https://github.com/NASA-PDS/validate/blob/e62a24a35ab6622914a0fc2c1b4af95e9c759783/src/main/java/gov/nasa/pds/tools/util/VersionInfo.java#L65-L89

The confusion happens here (useLabelSchema is related to the force switch which gets set to false when -S or -x is used):

https://github.com/NASA-PDS/validate/blob/e62a24a35ab6622914a0fc2c1b4af95e9c759783/src/main/java/gov/nasa/pds/tools/label/LabelValidator.java#L652-L661

tloubrieu-jpl commented 1 year ago

We decided to temporarily remove the cucumber test for this ticket because it breaks all the other tests. A different ticket aims at re-integrating this test https://github.com/NASA-PDS/validate/issues/633