Closed ItaloBorrelli closed 3 months ago
Hey Italo, thanks for reporting. could you provide an example identifier snippet? I'm assuming you are not including the full url in the code field?
Sure thing! Taking a look at this xml we have:
<mdb:MD_Metadata...
<mdb:identificationInfo>
<mri:MD_DataIdentification>
<cit:citation>
<cit:CI_Citation>
...
<cit:identifier>
<mcc:MD_Identifier>
<mcc:authority>
<cit:CI_Citation>
<cit:title>
<gco:CharacterString>DataCite</gco:CharacterString>
</cit:title>
</cit:CI_Citation>
</mcc:authority>
<mcc:code>
<gco:CharacterString>10.34943/4831b0a0-7f01-4863-b44f-2ef0729d45ef</gco:CharacterString>
</mcc:code>
<mcc:codeSpace>
<gco:CharacterString>doi.org</gco:CharacterString>
</mcc:codeSpace>
</mcc:MD_Identifier>
I checked with my metadata team because I wasn't confident and they are sure that this is a valid way of providing the DOI within ISO 19115 xml.
it is valid yes. The problem is that there are also other valid ways to represent it. for example, one could put the full URI in the code field. If using a version of the ISO standard other than 19115-3, this is frequently done. This would look like https://doi.org/10.34943/4831b0a0-7f01-4863-b44f-2ef0729d45ef
for example.
I can make a small change which I think will accommodate your use case but the https://
is a bit of a problem as it can be hard to know if it should be included or not. In the case of doi.org it is probably safe to assume it is a full URL. On the other hand in the case of datasets from 'GLOS', for example, it is less clear as their identifiers look like a URL but in fact do not link to anything.
Do you mean that if we add http or https to it it should be recognized as the doi url, and it's not working because of the exclusion of the protocol for matching? I haven't tried that out but I'll give it a go shortly. If http(s?)://doi.org is valid I can check if that would be ok for me to use for the codeSpace instead of just doi.org.
that would make life easier yes. Alternatively 'code' could be the full url while code space is 'doi.org' and authority is 'Data Cite'. There are many ways to do this. The iso recommendation seems to be either break it up into the 3 parts or use a full url for code.
We have added the protocol to the citation as I think we decided was the solution. You can see here:
<mcc:MD_Identifier>
<mcc:authority>
<cit:CI_Citation>
<cit:title>
<gco:CharacterString>DataCite</gco:CharacterString>
</cit:title>
</cit:CI_Citation>
</mcc:authority>
<mcc:code>
<gco:CharacterString>10.34943/d123e437-f06f-48f6-87a0-121a938ef792</gco:CharacterString>
</mcc:code>
<mcc:codeSpace>
<gco:CharacterString>https://doi.org</gco:CharacterString>
</mcc:codeSpace>
</mcc:MD_Identifier>
</cit:identifier>
Is this sufficient? Is there work that needs to be done on for the harvester to accommodate this as well?
*edited to remove question from code block
this appears to be resolved
Should be fixed with: https://github.com/cioos-siooc/ckanext-spatial/pull/44
Current behaviour: if identifier
<code>
has text matched by the regex here then it will be used for the citation.Expected behaviour: if
<codeSpace>
is doi.org then the<code>
value should be used as the doi citation identifier.