Open jeanetteclark opened 3 years ago
Here is a list of checks and xpaths that I think conflate entity and resource.
check | xpath | reasoning |
---|---|---|
entity.format.nonproprietary | /*/identificationInfo/MD_DataIdentification/resourceFormat/MD_Format | this seems to point to the resource, not the entity specifically |
entity.format.nonproprietary | /*/identificationInfo/MD_DataIdentification/resourceFormat/MD_Format/formatSpecificationCitation/CI_Citation/identifier/MD_Identifier/code | this seems to point to the resource, not the entity specifically |
entity.format.nonproprietary | /*/identificationInfo/MD_DataIdentification/resourceFormat/MD_Format/formatSpecificationCitation/CI_Citation/title | this seems to point to the resource, not the entity specifically |
entity.identifier.present | //identificationInfo//citation/CI_Citation/identifier/MD_Identifier/code//text()[normalize-space()] | this is the resource identifier and not the entity identifier |
entity.identifier.present | //identificationInfo//citation/CI_Citation/identifier/RS_Identifier/code//text()[normalize-space()] | I don't think a reference system is an entity |
entity.identifierType.present | //identificationInfo//citation/CI_Citation/identifier/MD_Identifier/codeSpace//text()[normalize-space()] | this is the resource identifier and not the entity identifier |
entity.identifierType.present | //identificationInfo//citation/CI_Citation/identifier/MD_Identifier/authority//text()[normalize-space()] | this is the resource identifier and not the entity identifier |
entity.name.present | /*/contentInfo/MD_CoverageDescription/attributeDescription/RecordType | this is the entity description, not the entity name |
entity.name.present | /*/contentInfo/MI_CoverageDescription/attributeDescription/RecordType | this is the entity description, not the entity name |
entity.type.present | /*/metadataScope/MD_MetadataScope/resourceScope/MD_ScopeCode | this describes the scope of the resource, not the entity specifically |
entity.type.present | /*/hierarchyLevel/MD_ScopeCode | this describes the scope of the resource, not the entity specifically |
I removed all of the above xpaths, since they fairly clearly reference resource information and not entity information. See PR #409
relevant commits: https://github.com/NCEAS/metadig-checks/pull/409/commits/f34b616a409b08920bc8ed44d2bd710dc04825ec (ISO) https://github.com/NCEAS/metadig-checks/pull/409/commits/cc2e75959cadf3760a2e2f34911d7542f28ae18c (ISO) https://github.com/NCEAS/metadig-checks/pull/409/commits/6216fd90e48f3b41f26199eb6a09c72a7ce6286b (ISO) https://github.com/NCEAS/metadig-checks/pull/409/commits/fc3c8dd38a5184e5ecc8c4b15c1cd350abfbbac9 (ISO) https://github.com/NCEAS/metadig-checks/pull/409/commits/6177840ad9ee8cc317bc166b2a26545d3fbac292 (ISO) https://github.com/NCEAS/metadig-checks/pull/408/commits/e784480deb7fba4ca09f747ffcbb48608c38f41f (DataCite) https://github.com/NCEAS/metadig-checks/pull/408/commits/832e94020d50ad8d0be03c36dac75c7c8093fb24 (DataCite) https://github.com/NCEAS/metadig-checks/pull/408/commits/9ddc99c532656d6690b54fb4074ce765e37ed247 (DataCite) https://github.com/NCEAS/metadig-checks/pull/408/commits/27107c81b62f5ce245d74ee55ea7447bc511f2e5 (DataCite)
related to https://github.com/NCEAS/metadig-checks/issues/18, some entity level checks, specifically:
entity.identifier.present entity.identifierType.present entity.format.present entity.type.present entity.format.nonproprietary
look in places that seem to point to the resource. If the resource is an entity, then this is appropriate. However, most of the metadata we are looking at describes dataset resources, which contain entities, but are not entities themselves.
For example, entity.identifier.present looks in
/*/identificationInfo/*/citation/CI_Citation/identifier/MD_Identifier/code//text()[normalize-space()]
for ISO. If the resource is an entity (like a feature, for example), this location makes sense. However, if the resource is a dataset, this location does not seem appropriate, since this will be the identifier of the dataset as a whole and not an entity within the dataset. I think as the checks are written, entities and datasets can easily be conflated.One solution is to be more explicit in the entity checks, and only look at resource-level information if the resource is itself an entity. I've listed the resource types for DataCite and ISO below, with ones I think could be considered entities in bold (some I am really not sure about). These resource specifications come from
MD_ScopeCode
for ISO andresourceTypeGeneral
for DataCite.Once we decide on a path forward, I can propose more specific fixes.