DOI-USGS / gems-tools-pro

GeMS Tools for ArcGIS Pro
Creative Commons Zero v1.0 Universal
49 stars 15 forks source link

Build Metadata tool (2.14.4; version of 8/21/23): problems with missing entity and attribute items and unrepresentable domains #79

Open alwunder opened 1 year ago

alwunder commented 1 year ago

I've been testing the Validate tool and the Build Metadata tool on an empty copy of our complete TNGeMS schema looking for issues. Basically, everything is good. However, I noticed that when running the Build Metadata tool, I am getting a huge number of errors in a couple different categories:

  1. Warnings for missing Entity _and_Attribute_Overview and Entity_and_Attribute_Detail_Citation for the following entity and attribute items, which should be part of the GeMS definitions:
    • Entities:
    • GenericPoints
    • FossilPoints
    • Attributes:
    • Lithology
    • ScientificConfidence
    • ErrorMeasure
    • FossilForms
    • FossilFormsSourceID
    • FossilAgeSourceID

I was able to correct the errors by adding the following lines to my_definitions.py myEntityDict and myAttribDict sections:

    'GenericPoints': ['Basic point feature class in a GeMS-compliant database', 'GeMS tools definition'],
    'FossilPoints': ['Stores locations of fossil localities and their types, as well as associated age and interpretive information. Analytical data may be represented either by using an "ExtendedAttributes" table or, if many data are of a single fossil type, by placing them in a user-defined, type-specific table such as "CrinoidData"', 'GeMS tools definition'],

and

    'Lithology': ['Indicates lithology found within this map unit. Domain is CGI’s Simple Lithology vocabulary (available at http://resource.geosciml.org/def/voc/)', 'GeMS tools definition'],
    'ScientificConfidence': ['Indicates how confidently existence and identity of lithology is identified as being found within this map unit', 'GeMS tools definition'],
    'ErrorMeasure': ['Measure of error whose values are recorded in AgePlusError and AgeMinusError fields', 'GeMS tools definition'],
    'FossilForms': ['Records verbose description of feature represented by this database row', 'GeMS tools definition'],
    'FossilFormsSourceID': ['Identifies source of feature and its attributes. Foreign key to DataSources table', 'GeMS tools definition'],
    'FossilAgeSourceID': ['Identifies source of age interpretation for this sample. Foreign key to DataSources table', 'GeMS tools definition'],
  1. The second issue is with unrepresentable domains. I got a huge number of errors (~200) for missing Attribute_Domain_Values. I looked through the metadata tool xml output and found that these errors occur only in fields that are part of the enumeratedValueDomainFieldList. I was able to correct this by adding the following lines to my_definitions.py myUnrepresentableDomainDict section:
    ### ADDED ALL enumeratedValueDomainFieldList ENTRIES FROM GeMS_Definition.py
    ### THIS FIXED THE "Attribute_Domain_Values is required in Attribute" METADATA ERROR
    'Type':'TEST-Arbitrary string',
    'LocationMethod':'TEST-Arbitrary string',
    'PartType':'TEST-Arbitrary string',
    #'ProportionTerm':'TEST-Arbitrary string',#This term was not causing an error
    'TimeScale':'TEST-Arbitrary string', #Not present in TNGeMS database
    'ExistenceConfidence':'TEST-Arbitrary string',
    'IdentityConfidence':'TEST-Arbitrary string',
    'ScientificConfidence':'TEST-Arbitrary string',
    'ParagraphStyle':'TEST-Arbitrary string',
    'AgeUnits':'TEST-Arbitrary string',
    'MapUnit':'TEST-Arbitrary string',
    'DataSourceID':'TEST-Arbitrary string',
    'DescriptionSourceID':'TEST-Arbitrary string',
    'DefinitionSourceID':'TEST-Arbitrary string',
    'LocationSourceID':'TEST-Arbitrary string',
    'OrientationSourceID':'TEST-Arbitrary string',
    'AnalysisSourceID':'TEST-Arbitrary string',
    'GeoMaterial':'TEST-Arbitrary string',
    'GeoMaterialConfidence':'TEST-Arbitrary string',

    It seems there may be an issue in the # UNREPRESENTABLE DOMAINS section of GeMS_FGDCMetadata.py in the else: statement near the end that deals with unrepresentable domains.

ethoms-usgs commented 1 year ago

Honestly, most of this is intended. I think we only added definitions in GeMS_Definitions for tables and fields that are in the required or as-needed example sections of the GeMS document. Aside from the required SourceID fields, the ones you list are essentially custom fields for which your definition might be a little different from someone else's. And even the SourceID fields, with the addition of the prefixes FossilForms and FossilAge, could have custom definitions. That said, I like your definitions. I doubt few would object if we added them to GeMS_Definitions. I will bring it up.

But with GenericPoints, 'Generic' is meant as a place holder for you to rename with a custom name. It's necessary for the automation of creating database tables but should be changed to describe the first-order 'Type' of the points within.

I am confused about 2. The values in those fields should be documented in the metadata as enumerated domains. It seems to me like the error you are getting is valid and indicating that the values in the fields are not found in the appropriate data dictionary tables; Glossary, DescriptionOfMapUnits, GeoMaterialDict, DataSources. Is that true or is the tool not building the enumerated domain values?

alwunder commented 1 year ago

OK, gotcha on the first part, that makes sense and was really no trouble, just wanted to make sure I wasn't missing something.

For 2, I think it's a problem with the tool not building the enumerated domain values, or something along those lines, as you suggested. I thought maybe it was because the gdb was empty that it wasn't working correctly, but I have tested the metadata tool on a bunch of complete, level 3 validated gdbs and the errors persist. When I add the TEST strings back to the definitions, they come through in the output xml file and I get no metadata validation errors from MP.

ethoms-usgs commented 1 year ago

I can't reproduce those errors. Can you send me an example gdb?