SETI / rms-hst-pipeline

Apache License 2.0
0 stars 1 forks source link

Product label errors flagged by validation tool #56

Open juzen2003 opened 1 year ago

juzen2003 commented 1 year ago
juzen2003 commented 6 months ago

Modifications made in https://github.com/SETI/rms-hst-pipeline/pull/76:

Issues that needs to be reviewed and discussed:

juzen2003 commented 5 months ago

Current changes: (updates after 1/19/24 meeting)

Pending items:

juzen2003 commented 1 month ago

Update on 5/23/24:

Open issues related to label errors raised by the validator with the new dictioanry PDS4_HST_1H00_1000:

  1. Update dictionary to allow unit (value mrad/pixel) attribute for hst:plate_scale (currently the code will workaround this by removing the unit attribute as discussed in 5/4/24 hst slack chat, will put it back later with updated dicitonary)
  2. Pending (based on 5/10/24 hst slack chat, wait for Mark's input): should we use Array_1D tag for Array_1D_Spectrum tag, or update dictionary to allow Array_1D_Spectrum tag?
    • Reference label: Dropbox/Shared-pdart/bundles_from_hst_pipeline/hst_05167/hst_05167-deliverable/miscellaneous_ghrs_shf/visit_01/z2no0101t_shf.xml
      <Array_1D_Spectrum>
      <name>Primary FITS data object</name>
      <local_identifier>fits_data_object_0</local_identifier>
      <offset unit="byte">23040</offset>
      <axes>1</axes>
      <axis_index_order>Last Index Fastest</axis_index_order>
      <description>
      Primary FITS data object: Standard header packet for this GHRS/D2 observation.
      </description>
      <Element_Array>
      <data_type>SignedMSB2</data_type>
      </Element_Array>
      <Axis_Array>
      <axis_name>Sample</axis_name>
      <elements>965</elements>
      <sequence_number>1</sequence_number>
      </Axis_Array>
      </Array_1D_Spectrum>
    • Error log:
      ERROR  [error.label.schema]   line 226, 24: cvc-complex-type.2.4.a: Invalid content was found starting with element '{"http://pds.nasa.gov/pds4/pds/v1":Array_1D_Spectrum}'. One of '{"http://pds.nasa.gov/pds4/pds/v1":Array, "http://pds.nasa.gov/pds4/pds/v1":Array_1D, "http://pds.nasa.gov/pds4/pds/v1":Array_2D, "http://pds.nasa.gov/pds4/pds/v1":Array_2D_Image, "http://pds.nasa.gov/pds4/pds/v1":Array_2D_Map, "http://pds.nasa.gov/pds4/pds/v1":Array_2D_Spectrum, "http://pds.nasa.gov/pds4/pds/v1":Array_3D, "http://pds.nasa.gov/pds4/pds/v1":Array_3D_Image, "http://pds.nasa.gov/pds4/pds/v1":Array_3D_Movie, "http://pds.nasa.gov/pds4/pds/v1":Array_3D_Spectrum, "http://pds.nasa.gov/pds4/pds/v1":Checksum_Manifest, "http://pds.nasa.gov/pds4/pds/v1":Encoded_Audio, "http://pds.nasa.gov/pds4/pds/v1":Encoded_Header, "http://pds.nasa.gov/pds4/pds/v1":Encoded_Image, "http://pds.nasa.gov/pds4/pds/v1":Header, "http://pds.nasa.gov/pds4/pds/v1":Stream_Text, "http://pds.nasa.gov/pds4/pds/v1":Table_Binary, "http://pds.nasa.gov/pds4/pds/v1":Table_Character, "http://pds.nasa.gov/pds4/pds/v1":Table_Delimited}' is expected.
  3. Input required: Some value of title tag under Identification_Area tag is too long, need to modified PRODUCT_LABEL.xml or increase the max length of the value for title tag.

    • Reference label: Dropbox/Shared-pdart/bundles_from_hst_pipeline/hst_05167/hst_05167-deliverable/miscellaneous_ghrs_shf/visit_01/z2no0101t_shf.xml

      <Identification_Area>
      <logical_identifier>urn:nasa:pds:hst_5167:miscellaneous_ghrs_shf:z2no0101t</logical_identifier>
      <version_id>1.0</version_id>
      <title>
      z2no0101t_shf.fits: Standard header packet file, containing observation parameters,
      for this GHRS/D2 observation from HST Program 5167.
      
      Note that observation "z2no0101t" did not obtain science data. Only ancillary data
      files documenting this activity are available.
      </title>
      ...
    • Error log:
      ERROR  [error.label.schema]   line 23, 12: cvc-maxLength-valid: Value 'z2no0101t_shf.fits: Standard header packet file, containing observation parameters, for this GHRS/D2 observation from HST Program 5167. Note that observation "z2no0101t" did not obtain science data. Only ancillary data files documenting this activity are available.' with length = '265' is not facet-valid with respect to maxLength '255' for type 'title'.
      ERROR  [error.label.schema]   line 23, 12: cvc-type.3.1.3: The value 'z2no0101t_shf.fits: Standard header packet file, containing observation parameters, for this GHRS/D2 observation from HST Program 5167. Note that observation "z2no0101t" did not obtain science data. Only ancillary data files documenting this activity are available.' of element 'title' is not valid.
  4. Input required: inside Observing_System tag, should we wrap the name tag with Observing_System_Component? (but if it's wrapped by Observing_System_Component, the validator will expect more tag like type to be added as well which will be weird in this case)
    • Reference label: Dropbox/Shared-pdart/bundles_from_hst_pipeline/hst_05167/hst_05167-deliverable/bundle.xml
      <Observing_System>
          <name>Hubble Space Telescope Goddard High Resolution Spectrograph</name>
          <Observing_System_Component>
              <name>Hubble Space Telescope</name>
              <type>Host</type>
              <Internal_Reference>
                  <lid_reference>urn:nasa:pds:context:instrument_host:spacecraft.hst</lid_reference>
                  <reference_type>is_instrument_host</reference_type>
              </Internal_Reference>
          </Observing_System_Component>
          <Observing_System_Component>
              <name>Goddard High Resolution Spectrograph</name>
              <type>Instrument</type>
              <Internal_Reference>
                  <lid_reference>urn:nasa:pds:context:instrument:hst.ghrs</lid_reference>
                  <reference_type>is_instrument</reference_type>
              </Internal_Reference>
          </Observing_System_Component>
          <name>Hubble Space Telescope Wide Field and Planetary Camera 2</name>
          <Observing_System_Component>
              <name>Hubble Space Telescope</name>
              <type>Host</type>
              <Internal_Reference>
                  <lid_reference>urn:nasa:pds:context:instrument_host:spacecraft.hst</lid_reference>
                  <reference_type>is_instrument_host</reference_type>
              </Internal_Reference>
          </Observing_System_Component>
          ...
    • Error log:
      ERROR  [error.label.schema]   line 78, 19: cvc-complex-type.2.4.a: Invalid content was found starting with element '{"http://pds.nasa.gov/pds4/pds/v1":name}'. One of '{"http://pds.nasa.gov/pds4/pds/v1":Observing_System_Component}' is expected.
  5. Input required: value of hst:moving_target_description is too long, need to update the dictionary to increase the max length of the value for hst:moving_target_description tag?
    • Reference label: Dropbox/Shared-pdart/bundles_from_hst_pipeline/hst_16310/hst_16310-deliverable/miscellaneous_wfc3_jit/visit_01/ieab01fwj_jit.xml
      <hst:Pointing_Parameters>
      <hst:hst_target_name>2I-BORISOV</hst:hst_target_name>
      <hst:moving_target_flag>true</hst:moving_target_flag>
      <hst:moving_target_keyword>COMET</hst:moving_target_keyword>
      <hst:moving_target_keyword>interstellar comet</hst:moving_target_keyword>  
      <hst:moving_target_description>
      TYPE=COMET, Q=2.006581893840375, E=3.356215101434632, I=44.05257068647377,
      O=308.1487262895379, W=209.12367864468, T=08-DEC-2019:13:04:54,
      TTimeScale=TDB, EQUINOX=J2000, EPOCH=01-AUG-2020:00:00:00, EpochTimeScale=TDB,
      R0=2.808, DT=87.2916, A1=7.093444347382E-8, A2=-1.443811535835E-8,
      A3=6.534734368324E-10, ALN=0.1112620426, NM=2.15, NN=
      </hst:moving_target_description>
      ...
    • Error log:
      ERROR  [error.label.schema]   line 140, 42: cvc-maxLength-valid: Value 'TYPE=COMET, Q=2.006581893840375, E=3.356215101434632, I=44.05257068647377,O=308.1487262895379, W=209.12367864468, T=08-DEC-2019:13:04:54, TTimeScale=TDB, EQUINOX=J2000, EPOCH=01-AUG-2020:00:00:00, EpochTimeScale=TDB, R0=2.808, DT=87.2916, A1=7.093444347382E-8, A2=-1.443811535835E-8, A3=6.534734368324E-10, ALN=0.1112620426, NM=2.15, NN=' with length = '337' is not facet-valid with respect to maxLength '255' for type 'ASCII_Short_String_Collapsed'.
      ERROR  [error.label.schema]   line 140, 42: cvc-complex-type.2.2: Element 'hst:moving_target_description' must have no element [children], and the value must be valid.
matthewtiscareno commented 1 month ago

1) I just submitted a SCR (change request) to the DDWG asking for mrad/pixel to be added to the dictionary. In the meantime, I agree with the workaround articulated by @juzen2003. 2) The Slack thread on 5/10/24 ended with me asking @markshowalter if there is any reason why we shouldn’t simply use Array_1D. He has not responded. I’ll put it on the agenda for our meeting this coming Tuesday. 3) Identification_Area is part of the core model, which we cannot modify unless we file a change request with the DDWG. Furthermore, it seems quite reasonable that a title should be limited to 255 characters. @markshowalter: Can some of the information in this title be moved to a comment? 4) This label snippet indeed seems faulty, and I think it's good that Validator has caught it. Any attribute name must be inside the class (in this case, Observing_System_Component) of which it is giving the name! In this case, I notice that a) <name>Hubble Space Telescope Goddard High Resolution Spectrograph</name> is outside the instance of Observing_System_Component that contains the LID for that instrument; it should be inside the class, but there is already a duplicate name inside that class, so decide which one to use and discard the other one; b) <name>Hubble Space Telescope Wide Field and Planetary Camera 2</name> is also outside any instance of Observing_System_Component, but in fact there is no instance that includes the LID for WFPC2 -- there should be one if that instrument is relevant to this data product, or perhaps it was included extraneously. Finally, c) There are two apparently duplicate instances of Observing_System_Component that include the LID for the entire spacecraft HST -- there should be only one, unless there is a reason that I'm missing. We need input from @markshowalter on how to address this, so I'll bring it up at our meeting on Tuesday. 5) Hmm, I don't see anything in the dictionary that explicitly limits the length to 255 characters, unless it is the value_data_type of ASCII_Short_String_Collapsed. Anyway, this is part of the HST dictionary, which is under our control, so we have freedom to adjust as seems good. Let's consult Tuesday on this one also.

matthewtiscareno commented 1 month ago

Following up: 1) No change from previous answer. In the best-case it will be six months before this is fixed. Current text will eventually be okay, but in the meantime Dave to remove unit attribute. 2) Mark gave a good reason, and I have started sounding out the DDWG about creating Array_1D_Spectrum. The change may or may not go through, and the best-case is that it will be six months before it's fixed. In the meantime, let's change it to Array_1D and see whether it validates. Current text may or may not eventually be okay, but in the meantime Dave to update. 3) Mark says that, when there is a "note" that is part of the title, that text should go into a comment instead of being part of the title. This should keep our titles below 255 characters, which is a limit we cannot change. @mace-space suggests that <Context_Area>.<comment> is the right place for this text. Dave to fix. 4) The label here is garbled, and needs a software solution. I thought it was weird that there are two instruments (GHRS and WFPC2) mentioned in one label, but I've just realized that the example is a bundle.xml file, so it might be that data from both instruments are in the bundle so that both instruments really do belong. In any case, there should be only one instance of Observing_System_Component for the spacecraft itself (I see two) and there should be one instance of Observing_System_Component for each instrument. All instances of name should be inside the Observing_System_Component class. Dave to fix. 5) @mit3ch confirms that this is a fix that I can make. I just need to change the value_data_type for hst:moving_target_description from ASCII_Short_String_Collapsed to ASCII_String. I've now done that, so this problem should disappear. Matt has fixed.

juzen2003 commented 2 weeks ago

Updates implemented in https://github.com/SETI/rms-hst-pipeline/pull/76 based on previous comments (06/17/24)

  1. Remove unit attribute from hst:plate_scale
  2. Use Array_1D for hdu data class Array_1D_Spectrum for now, will change it back to Array_1D_Spectrum once DDWG has an update.
  3. Update product label template, move the "note that" part of the <title> tag content under <Identification_Area> tag to <Context_Area>.<comment>
  4. Fixed the contents of <Observing_System_Component> tag under <Observing_System> in the bundle, product collection, and product label templates. Now there is one instance of Observing_System_Component for the spacecraft itself (host) and one instance of Observing_System_Component for each instrument.
  5. Update to use the latest schemas and schematrons for validator.