NCEAS / metadig-checks

MetaDIG suites and checks for data and metadata improvement and guidance.
Apache License 2.0
9 stars 9 forks source link

entity.format.nonproprietary #64

Closed gothub closed 2 years ago

gothub commented 5 years ago

Description

Check if each entity format is non-proprietary

Priority

Choose a priority for the FAIR suite (Required or Optional)

FAIR: Required

Issues

Procedure

gothub commented 5 years ago

For background info on this check, see https://github.com/NCEAS/metadig-checks/issues/23

emilyarobles commented 2 years ago

Response for failed check: ESS-DIVE recommends the use of non-proprietary file formats where possible. Review the [name file types included] file types included in your dataset and consider changing them to non-proprietary formats.

mbjones commented 2 years ago

@emilyarobles Other repositories use this check as well. Can we make the response language agnostic so it can be used across systems?

gothub commented 2 years ago

ESS-DIVE may have other formats to add to the known formats list

gothub commented 2 years ago

@mbjones

After viewing ESS-DIVE metadata examples, I noticed that metacatui inserts the media type into the EML element /eml/dataset/otherEntity/entityType, for data objects added via the editor, for example:

    <otherEntity id="urn-uuid-86f9dbef-f12f-4802-99f7-b3d05368b5a3">
      <entityName>DataONE_FAIR_Quality_Suite.csv</entityName>
      <entityDescription>this is a list of d1 checks</entityDescription>
      <entityType>text/csv</entityType>
    </otherEntity>

The entity.format.nonproprietary check is only checking the EML elements selected with this Xpath:

/eml/dataset/*/physical/dataFormat/externallyDefinedFormat/formatName/

Should this check also be inspecting `/eml/dataset/otherEntity/entityType' ?

mbjones commented 2 years ago

Yeah, that check algorithm sounds possibly incomplete, although it is debatable whether otherEntity/entityType should be used. For certain I think:

1) //physical/dataFormat/textFormat are all non-proprietary text formats 2) //physical/dataFormat/binaryRasterFormat are all non-proprietary BIP and BIL formats 3) //physical/dataFormat/externallyDefinedFormat may be proprietary or non-proprietary depending on the value found in ./formatName

EML's otherEntity type can have a physical section and therefore could be used to describe types as above. It also has the //otherEntity/entityType field, which is defined as:

The entityType field contains the name of the entity's type. The entity's type is typically the name of the type of data represented in the entity, such as "photograph". This field is used only if this is an 'other' entity and you want to specify the kind of "other" entity this is.

Note that the example for this field is a value like "photograph" or "Photograph" that is uncontrolled and is meant to qualify the "otherness" of otherEntity. And it is optional. So, while metacatUI seems to put a value there, I think the right location for controlled entity format information is in the //physical/dataFormat section as described above. That said, if we've been using it consistently for mime-type info, it might be something we should discuss. It may have been used for mimeType info because //textFormat doesn't have a mime type field AFAICT.

gothub commented 2 years ago

@laurenwalker what are your thoughts on using //otherEntity/entityType for metadata assessment?

gothub commented 2 years ago

ESS-DIVE has decided to use [entity.type.nonproprietary](https://github.com/NCEAS/metadig-checks/issues/436) instead of this check.