NCEAS / metadig-checks

MetaDIG suites and checks for data and metadata improvement and guidance.
Apache License 2.0
9 stars 9 forks source link

Make distinction between entity and resource more explicit in some entity level checks #400

Open jeanetteclark opened 3 years ago

jeanetteclark commented 3 years ago

related to https://github.com/NCEAS/metadig-checks/issues/18, some entity level checks, specifically:

entity.identifier.present entity.identifierType.present entity.format.present entity.type.present entity.format.nonproprietary

look in places that seem to point to the resource. If the resource is an entity, then this is appropriate. However, most of the metadata we are looking at describes dataset resources, which contain entities, but are not entities themselves.

For example, entity.identifier.present looks in /*/identificationInfo/*/citation/CI_Citation/identifier/MD_Identifier/code//text()[normalize-space()] for ISO. If the resource is an entity (like a feature, for example), this location makes sense. However, if the resource is a dataset, this location does not seem appropriate, since this will be the identifier of the dataset as a whole and not an entity within the dataset. I think as the checks are written, entities and datasets can easily be conflated.

One solution is to be more explicit in the entity checks, and only look at resource-level information if the resource is itself an entity. I've listed the resource types for DataCite and ISO below, with ones I think could be considered entities in bold (some I am really not sure about). These resource specifications come from MD_ScopeCode for ISO and resourceTypeGeneral for DataCite.

Once we decide on a path forward, I can propose more specific fixes.

jeanetteclark commented 3 years ago

Here is a list of checks and xpaths that I think conflate entity and resource.

check xpath reasoning
entity.format.nonproprietary /*/identificationInfo/MD_DataIdentification/resourceFormat/MD_Format this seems to point to the resource, not the entity specifically
entity.format.nonproprietary /*/identificationInfo/MD_DataIdentification/resourceFormat/MD_Format/formatSpecificationCitation/CI_Citation/identifier/MD_Identifier/code this seems to point to the resource, not the entity specifically
entity.format.nonproprietary /*/identificationInfo/MD_DataIdentification/resourceFormat/MD_Format/formatSpecificationCitation/CI_Citation/title this seems to point to the resource, not the entity specifically
entity.identifier.present //identificationInfo//citation/CI_Citation/identifier/MD_Identifier/code//text()[normalize-space()] this is the resource identifier and not the entity identifier
entity.identifier.present //identificationInfo//citation/CI_Citation/identifier/RS_Identifier/code//text()[normalize-space()] I don't think a reference system is an entity
entity.identifierType.present //identificationInfo//citation/CI_Citation/identifier/MD_Identifier/codeSpace//text()[normalize-space()] this is the resource identifier and not the entity identifier
entity.identifierType.present //identificationInfo//citation/CI_Citation/identifier/MD_Identifier/authority//text()[normalize-space()] this is the resource identifier and not the entity identifier
entity.name.present /*/contentInfo/MD_CoverageDescription/attributeDescription/RecordType this is the entity description, not the entity name
entity.name.present /*/contentInfo/MI_CoverageDescription/attributeDescription/RecordType this is the entity description, not the entity name
entity.type.present /*/metadataScope/MD_MetadataScope/resourceScope/MD_ScopeCode this describes the scope of the resource, not the entity specifically
entity.type.present /*/hierarchyLevel/MD_ScopeCode this describes the scope of the resource, not the entity specifically
jeanetteclark commented 3 years ago

I removed all of the above xpaths, since they fairly clearly reference resource information and not entity information. See PR #409

relevant commits: https://github.com/NCEAS/metadig-checks/pull/409/commits/f34b616a409b08920bc8ed44d2bd710dc04825ec (ISO) https://github.com/NCEAS/metadig-checks/pull/409/commits/cc2e75959cadf3760a2e2f34911d7542f28ae18c (ISO) https://github.com/NCEAS/metadig-checks/pull/409/commits/6216fd90e48f3b41f26199eb6a09c72a7ce6286b (ISO) https://github.com/NCEAS/metadig-checks/pull/409/commits/fc3c8dd38a5184e5ecc8c4b15c1cd350abfbbac9 (ISO) https://github.com/NCEAS/metadig-checks/pull/409/commits/6177840ad9ee8cc317bc166b2a26545d3fbac292 (ISO) https://github.com/NCEAS/metadig-checks/pull/408/commits/e784480deb7fba4ca09f747ffcbb48608c38f41f (DataCite) https://github.com/NCEAS/metadig-checks/pull/408/commits/832e94020d50ad8d0be03c36dac75c7c8093fb24 (DataCite) https://github.com/NCEAS/metadig-checks/pull/408/commits/9ddc99c532656d6690b54fb4074ce765e37ed247 (DataCite) https://github.com/NCEAS/metadig-checks/pull/408/commits/27107c81b62f5ce245d74ee55ea7447bc511f2e5 (DataCite)