Open rshewitt opened 2 months ago
iso19139ngdc
schema. I isolated the validation logic locally and the iso19139ngdc schema can't seem to find the MD_Metadata
root element. looking more into this.xml validation can't find where MD_Metadata
is declared in the mdb
namespace which explains the issue mentioned in my last comment. mdb
is declared via
<mdb:MD_Metadata xmlns:mdb="http://standards.iso.org/iso/19115/-3/mdb/2.0">
<!-- other content -->
</mdb:MD_Metadata>
^ all the ISO19115-3 fixtures in mdtranslator do this. The root element declaration Chris MacDermaid gave me imports the namespace like that too.
I use a xml parsing extension on VSC. It'll process the document according to the schemas and tell me when something is wrong. Importing the mdb
namespace mentioned above causes an error. However, when I removed that declaration and add http://standards.iso.org/iso/19115/-3/mdb/2.0 https://standards.iso.org/iso/19115/-3/mdb/2.0/metadataBase.xsd
to xsi:schemaLocation
my xml processor doesn't complain anymore. Like so...
<mdb:MD_Metadata xsi:schemaLocation="http://standards.iso.org/iso/19115/-3/mdb/2.0 https://standards.iso.org/iso/19115/-3/mdb/2.0/metadataBase.xsd">
<!-- other content -->
</mdb:MD_Metadata>
^ this solution doesn't resolve the underlying issue caused by xml validation in python
okay so a breakdown of some relevant ISO standards (source)
catalog uses the NGDC-specific implementation of ISO19139, labelled as iso19139ngdc
, for ISO19115 validation (source)
pausing on this ticket. need group discussion on the metadata we manage and where we wanna go. huddled with @btylerburton & @FuhuXia on getting a distribution count of ISO standards we currently manage ( e.g. ISO19115, ISO19115-1, ISO19115-2, ISO19115-3 )
Have a script ready to get all WAF/WAF-collection harvest sources, their dataset counts, and sample xml file for its standard analysis.
Here is the result. result.txt
Out of the 470 WAF/WAF-collection harvest sources this is the breakdown of documents per schema. so we need to be able to transform those 3 schemas into DCATUS for harvester 2.0. a sample xml was taken from each collection. the schema of that xml was assumed to apply to the entire collection.
calculated via script
@FuhuXia is getting me a list of all the data provider xml urls we currently harvest. from that i'll count how many schemas we process.
script updated to show ISO vs FGDC.
New list attached. result.2.txt
Moving this to H2.0 backlog because of reprioritization.
User Story
In order to identify changes between documents, datagov wants to harvest an ISO19115 document and its transformed counterpart (DCATUS) on catalog-dev.
identify changes in content between an ISO19115 document and its transformed counterpart (DCATUS)
Acceptance Criteria
[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]
Background
Resources
[Any helpful contextual notes or links to artifacts/evidence, if needed]
Security Considerations (required)
[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]
Sketch
[Notes or a checklist reflecting our understanding of the selected approach]