GSA / data.gov

Main repository for the data.gov service
https://data.gov
Other
628 stars 99 forks source link

ISO 19115 metadata is converted to lossy ISO 19139 format when ingested by CKAN #769

Closed adborden closed 5 months ago

adborden commented 5 years ago

Geospatial datasets in ISO 19115 lose metadata when CKAN harvests them and converts them to ISO 19139.

How to reproduce

Expected behavior

Actual behavior

adborden commented 5 years ago

Would be good to grab a specific example and list out the metadata fields that are lost.

adborden commented 5 years ago

Some background from @JJediny

Thread on FGDC call https://github.com/ckan/ckanext-spatial/tree/master/ckanext/spatial/validation/xml

johnjediny 3 hours ago current ckan support for xml harvesting metadata

johnjediny 3 hours ago think the current issue is how we’re handling our harvest sources transformations into ckan tables that that common schema we conform to should be updated to 19115-3 as a new planned common core that would be compatible with pyCSW without having to risk losing fields in our current transformation.

johnjediny 3 hours ago im not so up to speed on the best ways to go about approaching that re flask/spatial-ext/scheming-ext

johnjediny 3 hours ago if they want us to temporarily update the xlst(s) we current install with the current java transformation that might be the least-work fix

johnjediny 3 hours ago Here is the 19115 v3 schema project https://github.com/ISO-TC211 ISO/TC 211 Repositories 27 @ISO-TC211 | May 26th, 2014 | Added by GitHub

johnjediny 3 hours ago artic metadata project re existing mapping work https://github.com/adiwg/mdJson-schemas adiwg/mdJson-schemas JSON schemas, examples, and templates for ADIwg metadata standards Website http://www.adiwg.org/projects/ Stars 7 adiwg/mdJson-schemas | Feb 18th, 2014 | Added by GitHub

johnjediny 3 hours ago This was created by Micah.Wengren@noaa.gov https://docs.google.com/spreadsheets/u/1/d/19L89sgSijpB9nvaWjgGTSo3m3r06zuXKq9C5QwcbgRs/edit

JJediny commented 5 years ago

Best long term solution may be to convert to using 19115-3 as CKAN's core schema. Which could be accomplished using https://github.com/ckan/ckanext-scheming w/ JSON schemas that have already been prepared for the 19115-3 FGDC profile - https://github.com/adiwg/mdJson-schemas/tree/master/schema as there is a good amount of work done mapping DCAT to ISO 19115 it might be easier to map data.json into 19115 - https://www.w3.org/2015/spatial/wiki/ISO_19115_-_DCAT_-_Schema.org_mapping

nickumia-reisys commented 1 year ago

Is this still relevant? @GSA/data-gov-team (specifically @FuhuXia and @jbrown-xentity)

jbrown-xentity commented 1 year ago

This is still relevant. Related to the transformation from CSDGM (FGDC standard) into ISO, done in ckanext-geodatagov. Hopefully this will be addressed with the https://github.com/adiwg/mdTranslator tool, but still TBD.

hkdctol commented 1 year ago

@jbrown-xentity @nickumia-reisys I will put this back in icebox then?

nickumia-reisys commented 1 year ago

I had brought this back up because of the harvesting requirement derivation/documentation. I wasn't sure if this was still a consideration we needed to address. Since it is, I'll pull the relevant aspects and we can move it back to the icebox.

gujral-rei commented 5 months ago

@jbrown-xentity will we address this with https://github.com/GSA/data.gov/issues/4639 and https://github.com/GSA/data.gov/issues/4564 ?

jbrown-xentity commented 5 months ago

@jbrown-xentity will we address this with #4639 and #4564 ?

That is the hope. However, I always believed this "error" was actually just a bad UI representation of the data; all of the source data was accessible "as is" without transform, as well as the CKAN view.