NASA-PDS / validate

Validates PDS4 product labels, data and PDS3 Volumes
https://nasa-pds.github.io/validate/
Apache License 2.0
16 stars 11 forks source link

As a user, I want to support bit patterns within Special_Constants values #651

Closed jordanpadams closed 1 year ago

jordanpadams commented 1 year ago

Checked for duplicates

No - I haven't checked

🧑‍🔬 User Persona(s)

Data Engineer, Data Provider

💪 Motivation

...so that I can use bit patterns in most (it not all) Special_Constants attributes

📖 Additional Details

No response

Acceptance Criteria

Given When I perform Then I expect

⚙️ Engineering Details

Related to discussion starting here: https://github.com/NASA-PDS/validate/issues/611#issuecomment-1555457236

jordanpadams commented 1 year ago

Status: Needs input from DDWG

al-niessner commented 1 year ago

@jordanpadams

This has been put into the sprint backlog for work. What is the resolution of the DDWG? Do we do 16#ef# or 0xef like the rest of the world or both?

jordanpadams commented 1 year ago

bah. ok. let me check

kbowley-asu commented 1 year ago

As an affected party, I vote for the 0xef format like the rest of the world. The 16#ef# format was a holdover from the PDS3 labels, and very easy to convert to a standard format. We've been unable to use any version of validate after 3.1.1 due to this issue.

al-niessner commented 1 year ago

@kbowley-asu

Do you have a small example that I could use to understand the full extent of what you requesting - is it that you can no longer put 16#ef# into the XML or that validate is no longer processing it well or a bit of both.

kbowley-asu commented 1 year ago

Our PDS3 labels have the funky 16#@..# formatted values, so originally we had those in our PDS4 labels, but after validate broke being able to validate anything, we looked closer and were more than happy to change to the standard format. Here's an example from what we are currently generating:

    <Array_2D_Image>
      <local_identifier>Array_2D_Image</local_identifier>
      <offset unit="byte">10560</offset>
      <axes>2</axes>
      <axis_index_order>Last Index Fastest</axis_index_order>
      <Element_Array>
        <data_type>IEEE754LSBSingle</data_type>
        <unit>I/F</unit>
      </Element_Array>
      <Axis_Array>
        <axis_name>Line</axis_name>
        <elements>20748</elements>
        <sequence_number>1</sequence_number>
      </Axis_Array>
      <Axis_Array>
        <axis_name>Sample</axis_name>
        <elements>704</elements>
        <sequence_number>2</sequence_number>
      </Axis_Array>
      <Special_Constants>
        <missing_constant>0xFF7FFFFB</missing_constant>
        <high_instrument_saturation>0xFF7FFFFE</high_instrument_saturation>
        <high_representation_saturation>0xFF7FFFFF</high_representation_saturation>
        <valid_minimum>0xFF7FFFFA</valid_minimum>
        <low_instrument_saturation>0xFF7FFFFD</low_instrument_saturation>
        <low_representation_saturation>0xFF7FFFFC</low_representation_saturation>
      </Special_Constants>
    </Array_2D_Image>
al-niessner commented 1 year ago

@kbowley-asu

If you use a validating XML editor with the PDS4 schema, does this XML with the 0xef style validate? Does it also pass the PDS4 schematron? If you do not understand either of those questions, then a product example along with referenced files would be nice. Keeping it to a few 10s of KB would be nice too but I will take whatever I can get right now.

kbowley-asu commented 1 year ago

I understand those references, but have not run any of the labels through any other validation (besides general xml parsing engines like nokogiri), and I use vim (with syntax highlighting) for editing. The image for the label I pulled that example out of is only 56M, so relatively very small. https://pds.lroc.asu.edu/data/LRO-L-LROC-3-CDR-V1.0/LROLRC_1054C/DATA/ESM5/2023047/WAC/M1431146123CC.xml and https://pds.lroc.asu.edu/data/LRO-L-LROC-3-CDR-V1.0/LROLRC_1054C/DATA/ESM5/2023047/WAC/M1431146123CC.IMG

al-niessner commented 1 year ago

@jordanpadams @kbowley-asu

I see. Without validate.sh content validation it passes which means that the syntax is legal from the schema and schematron standpoint. Since we are not supposed to check what the schema/schematron already checks then I am going to add the processing for 0x now and if it becomes illegal from schema/schematron changes then what to do will become clear.

kbowley-asu commented 1 year ago

Yes... I guess I should have clarified that validate is perfectly happy if I use the --skip-content-validation option.

jordanpadams commented 1 year ago

@al-niessner sounds good! Thanks!

al-niessner commented 1 year ago

I have a working fix for this example. It can be expanded quickly if not all primitive types are covered.

al-niessner commented 1 year ago

@jordanpadams @kbowley-asu

Okay, my mistake. Fix is horrible. It is totally susceptible to bit rounding but not how we think I am getting two numbers that seem the same but are not. The minimum is: 0xFF7FFFFA and after reading the data in from the image it is showing as 0xFF7FFFFB which, as it turns out, less than the limit. So, bit patterns may not be all that helpful. There are lots of such points in the file you referenced earlier today. @kbowley-asu can you verify (I like independent verification) that there are or are not floats in the file that are 0xFF7FFFFB?

kbowley-asu commented 1 year ago

0xFF7FFFFB is listed as the missing_constant in the Special_Constants. My understanding is that anything below valid_minimum is not part of the valid data and should be defined by one of the Special_Constants.

al-niessner commented 1 year ago

@kbowley-asu thank you for confirming existence. Was able to fix the problem and your images once again pass along with our regression tests that check other aspects of the special constants.