NCEAS / metadig-engine

MetaDig Engine: multi-dialect metadata assessment engine
7 stars 5 forks source link

add field for data check output to `result` schema #450

Open jeanetteclark opened 2 months ago

jeanetteclark commented 2 months ago

after discussing with @robyngit, we determined that it would really be necessary to filter results from data quality runs by file. This isn't necessarily MVP for the data suite but I think it's close. The easiest way to do this I think would be to add a field to the result schema within run. This would allow us to continue to us the output field to give dataset-level results (eg: 8 files had congruent file types, 1 file was mistyped), in addition to file level results (eg: file.nc appears to be of type application/netcdf but is documented as application/octet-stream).

    <xs:sequence>
      <xs:element name="id" type="xs:string"/>
      <xs:element name="timestamp" type="xs:dateTime"/>
      <xs:element name="objectIdentifier" type="xs:string" minOccurs="0"/>
      <xs:element name="suiteId" type="xs:string" minOccurs="0"/>
      <xs:element name="result" type="tns:result" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:complexType>

proposed new schema for result could look something like this (though I'm very much not sold on "dataOutput" as the element name, it's a placeholder.

  <xs:complexType name="result">
    <xs:sequence>
      <xs:element name="check" type="tns:check" minOccurs="0"/>
      <xs:element name="timestamp" type="xs:dateTime" minOccurs="0"/>
      <xs:element name="output" type="tns:output" nillable="true" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element name="status" type="tns:status" minOccurs="0"/>
      <xs:element name="dataOutput" type="tns:dataOutput" nillable="true" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:complexType>
  <xs:complexType name="dataOutput">
    <xs:simpleContent>
      <xs:extension base="xs:string">
        <xs:attribute name="id" type="xs:string" use="optional"/>
      </xs:extension>
    </xs:simpleContent>
  </xs:complexType>

thoughts welcome @mbjones when you return

jeanetteclark commented 2 months ago

Since the output field is repeatable, so we just need to add id as an optional attribute.

jeanetteclark commented 1 month ago

this is done on the feature-hashstore-support branch, awaiting review

see src/main/resources/schemas/schema1.1.xsd

also see edu.ucsb.nceas.mdqengine.dispatch.Dispatcher line 174 for how an array coming back from python checks is handled (warning: its a little ugly right now)