Open epifanio opened 3 years ago
Regarding the last part of the issue, the one related to the keywords issue - to distinguish between keywords in ISO with and without a thesaurus_name, will it make sense to have a column (which can be empty) to sp[ecify the 'dialect'/'flavour' of the ISO record ... in my case GCMD? -- then try to add some logic in the core code to distinguish between keywords with/without a thesaurs_name .. which will affect the transformation into a specific output profile?
I may have found a little hack to tune the output the way I needed, by modifying 'dif.py':
# keywords
val = util.getqattr(result, context.md_core_model['mappings']['pycsw:Keywords'])
if val:
for kw in val.split(','):
if len(kw.split(">")) >= 2:
values = kw.split(">")
parameters = etree.SubElement(node, util.nspath_eval('dif:Parameters', NAMESPACES)) # .text = kw
etree.SubElement(parameters, util.nspath_eval('dif:Category', NAMESPACES)).text = values[0]
etree.SubElement(parameters, util.nspath_eval('dif:Topic', NAMESPACES)).text = values[1]
etree.SubElement(parameters, util.nspath_eval('dif:Term', NAMESPACES)).text = values[2]
for i,v in enumerate(values[3:]):
etree.SubElement(parameters, util.nspath_eval(f'dif:Variable_Level_{i+1}', NAMESPACES)).text = v
else:
etree.SubElement(node, util.nspath_eval('dif:Keywords', NAMESPACES)).text = kw
Note, this will work only for my specific case where I am sure the GCMD
keywords I need to parse have all the >
symbol as splitter.
The code above will return:
<dif:Parameters>
<dif:Category>Earth Science</dif:Category>
<dif:Topic>Atmosphere</dif:Topic>
<dif:Term>Atmospheric radiation</dif:Term>
<dif:Variable_Level_1>Reflectance</dif:Variable_Level_1>
</dif:Parameters>
From a ISO
keywords like:
<gmd:keyword>
<gco:CharacterString>
EARTH SCIENCE > Atmosphere > Atmospheric Winds > Surface Winds
</gco:CharacterString>
</gmd:keyword>
@epifanio is this still an issue?
Description
Problem: mapping of ISO records to DIF (using GCMD DIF type/subtype vocabulary).
Given an ISO-compliant metadata Record, I encountered some issues in the mapping to DIF at different levels. Listing two examples:
Environment
Steps to Reproduce
Indexing the following ISO Record:
Results in the following DIF profile
The DIF output doesn't match the information available in the original ISO source.
Data Access
Currently the protocols are just the same as the ISO records.
Current DIF output
Expected DIF9.7 output
Dataset landing page
Current ISO output
As Related_URL using type
DATASET LANDING PAGE
.Expected DIF output
Current DIF output:
Expected DIF output
Additional Information
There are other issues related to how the ISO keywords are mapped to DIF in particular the GCMD Science Keywords.
in ISO we have:
see reference ISO
This in mapped from
apiso:Subject
intocsw:Keywords
which is then mapped todif:Keyword
in dif.pyIn principle it should be mapped instead into DIF 9 As Parameters (with subelement) when the thesauri name is GCMD and Keyword (string) for any other thesauri name.
As this is too complicated I would try to get only the GCMD thesauri, thus I need to map all ISO entries to Parameter in this structure:
See http://metadata.nersc.no/oai?verb=ListRecords&metadataPrefix=dif for example