Closed yvanlebras closed 1 year ago
@yvanlebras This metadata sheet is not valid. You should liaise with @wheintz he may want to update it. First error mentioned when doing validation relates to the citation 'edition', This should be a string. This metadata had been generated with geoflow in the past, where the edition was set with a datetime object, not valid for ISO/OGC schemas. The current code prevents this setting as.character
when using setEdition
. However, decoding an existing XML will not coerce the date to character, it sees a gco:DateTime, so it converts it to the equivalent R binding (POSIXt).
Thank You so much Emmanuel for your rapid and detailled feedback! So the ISO is not valid! "Ah ba bravo Wilfried ;)" Will come back to him. A particular point is that I am testing several metadata documents, from several geonetwork catalogs, and for now, no one is valid... It appears to me that this is strange, but maybe this can be explained by several reasons... One interrogation I have is that on the PNDB side, we can't upload EML metadata documents if they are not valid... And, on geonetwork, it appears you can upload metadata document not valid regarding ISO19139 spec ? Sorry if this is "just" something I discover because I am a GIS newbie ;) AND Thank you for geometa, really easy to use ! Looking forward to create a related Galaxy tools so we can exacute these tests on batch mode quite easily (I don't think we can execute geometa ISO compliance test on URL from ISO xlm docs isn't it?)!
I precise that it was not valid because of an issue in geometa in the past ;-) in the geoflow action dedicated to ISO 19115 production.
In Geonetwork, validation is weird. I validate ISO 19115/19139 in geometa using XSD schema definitions. In geonetwork as well, but they don't use the same version of the XSD schemas specifications. I should probably look into that and compare schema uses, but I didn't have time to dig into that.
With geometa, you can use readISO19139
function that accept a file
or url
, and next test if they are valid, but indeed you will have to read the XML doing that.
OK, we have time to dig into that as this is our job ;) I will keep you informed!
Thank you again for this amazing package and your help!!!!
For readISO19139
, normally something like readISO19139(url="http://indores-tmp.in2p3.fr/geonetwork/srv/api/records/112ebeea-e79c-422c-8a43-a5a8323b446b/formatters/xml?approved=true")
is ok?
I have an error message trying on this geonetwork page http://indores-tmp.in2p3.fr/geonetwork/srv/fre/catalog.search#/metadata/112ebeea-e79c-422c-8a43-a5a8323b446b :
Error: XML content does not seem to be XML: '/gmd",
"@xmlns:gml": "http://www.opengis.net/gml/3.2",
"@xmlns:gts": "http://www.isotc211.org/2005/gts",
"@xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance",
"@xsi:schemaLocation": "http://www.isotc211.org/2005/gmd http://schemas.opengis.net/csw/2.0.2/profiles/apiso/1.0.0/apiso.xsd",
"gmd:fileIdentifier": {"gco:CharacterString": {
"@xmlns:gco": "http://www.isotc211.org/2005/gco",
"#text": "112ebeea-e79c-422c-8a43-a5a8323b446b"
}},
"gmd:language": {"gco:CharacterString": {
"@xmlns:gco": "http://www.isotc211.org/2005/gco",
"#text": "fre"
}},
"gmd:characterSet": {"gmd:MD_CharacterSetCode": {
"@codeList": "http://standards.iso.org/iso/19139/resources/gmxCodelists.xml#MD_CharacterSetCode",
"@codeListValue": "utf8"
}},
"gmd:hierarchyLevel": {"gmd:MD_ScopeCode": {
"@codeList": "http://standards.iso.org/iso/19139/resources/gmxCodelists.xml#MD_ScopeCode",
"@codeListValue": "dataset"
Hum, on that, geometa can't do anything because it doesn't support JSON-LD format for ISO 19115-3. Indeed you will have to look at the XML format: http://indores-tmp.in2p3.fr/geonetwork/srv/api/records/112ebeea-e79c-422c-8a43-a5a8323b446b/formatters/xml?approved=true
But indeed it fails, probably a matter of accept content type... weird behavior from Geonetwork that given the xml formatter should (at least) align download format. I've juste pushed a change in readISO19139
to force text/xml as Accept header. Can you reinstall and try?
more likely !!
I just updated geometa, to 0-6-6 version... and the same error... but maybe this is not the last version, with your push ? I made install.packages("geometa")
no, you should install from github, using remotes
package:
remotes::install_github("eblondel/geometa")
.. I am testing with install_github("eblondel/geometa")
to have last version
same and apparently I have the last version of geometa:
> library("geometa")
> readISO19139(url="http://indores-tmp.in2p3.fr/geonetwork/srv/api/records/112ebeea-e79c-422c-8a43-a5a8323b446b/formatters/xml?approved=true")
Error: XML content does not seem to be XML: '/gmd",
"@xmlns:gml": "http://www.opengis.net/gml/3.2",
"@xmlns:gts": "http://www.isotc211.org/2005/gts",
"@xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance",
"@xsi:schemaLocation": "http://www.isotc211.org/2005/gmd http://schemas.opengis.net/csw/2.0.2/profiles/apiso/1.0.0/apiso.xsd",
"gmd:fileIdentifier": {"gco:CharacterString": {
"@xmlns:gco": "http://www.isotc211.org/2005/gco",
"#text": "112ebeea-e79c-422c-8a43-a5a8323b446b"
}},
"gmd:language": {"gco:CharacterString": {
"@xmlns:gco": "http://www.isotc211.org/2005/gco",
"#text": "fre"
}},
"gmd:characterSet": {"gmd:MD_CharacterSetCode": {
"@codeList": "http://standards.iso.org/iso/19139/resources/gmxCodelists.xml#MD_CharacterSetCode",
"@codeListValue": "utf8"
}},
"gmd:hierarchyLevel": {"gmd:MD_ScopeCode": {
"@codeList": "http://standards.iso.org/iso/19139/resources/gmxCodelists.xml#MD_ScopeCode",
"@codeListValue": "dataset"
> #install.packages("ncdf4")
> remotes::install_github("eblondel/geometa")
Skipping install of 'geometa' from a github remote, the SHA1 (10e8a1e0) has not changed since last install.
Use `force = TRUE` to force installation
Ok reintall, it should work with application/xml, but apparently with GN it doesn't with text/xml
reinstalled, but same error sorry... Will look at it in the upcoming days! No urgence!! THANK YOU So much Emmanuel!!!!! HAve a nice night ;)
That sounds a issue of cache. If you use Rstudio, make sure to restart session, and renstall geometa after. This code works for me:
md = readISO19139(url = "http://indores-tmp.in2p3.fr/geonetwork/srv/api/records/112ebeea-e79c-422c-8a43-a5a8323b446b/formatters/xml?approved=true")
Of course ;) restart RStudio or Windows allows the command to be executed! Thank you so much!!!!
So, I can use:
md <- readISO19139(url="http://indores-tmp.in2p3.fr/geonetwork/srv/api/records/112ebeea-e79c-422c-8a43-a5a8323b446b/formatters/xml?approved=true")
as
xml <- xmlParse("~/1_PNDB/0_USe_cases/ISO2EML_indores/112ebeea-e79c-422c-8a43-a5a8323b446b.xml")
md <- ISOMetadata$new(xml = xml)
before executing md¤encode to know if the ISO are valid or not that's it ? And the fact that the result is NO can be because of the ISO schema not supported by geometa (ISO 19115-3) ?
No really, ISO 19115-3 is backward compatible with ISO 19115-1 and 19115-2; but here the Geonetwork XML formatter seems to deliver you an ISO 19115-1 given the schemas that are referenced.
First issue of validity in your example is the absence of metadata contact
. This can be seen when you try to validate the record:
md <- readISO19139(url="http://indores-tmp.in2p3.fr/geonetwork/srv/api/records/112ebeea-e79c-422c-8a43-a5a8323b446b/formatters/xml?approved=true")
md$validate()
Output:
Element '{http://www.isotc211.org/2005/gmd}dateStamp': This element is not expected. Expected is one of ( {http://www.isotc211.org/2005/gmd}hierarchyLevel, {http://www.isotc211.org/2005/gmd}hierarchyLevelName, {http://www.isotc211.org/2005/gmd}contact ) at line 14.
This is the raw output of the XML validator, and i admit it's hard to decode. But here, you have to look at the latest element mentioned: contact
. It is missing in your metadata. A valid metadata should have put it as tag with "missing" attribute.
To look at other validity issues (the more serious ones), you can just put a NA to the contact: md$contact <- NA
. A note apart on this: We might have asked geometa to this for us, but... by principle, geometa doesn't alter the metadata representation of your XML. If no contact is specified, it remains an empty list in the md
object.
Next if you try to validate again, here it's serious:
md$validate()
Output:
Element '{http://www.isotc211.org/2005/gmd}referenceSystemInfo', attribute '{http://www.w3.org/2001/XMLSchema-instance}type': The type definition '{http://www.isotc211.org/2005/gmd}PT_FreeText_PropertyType', specified by xsi:type, is blocked or not validly derived from the type definition of the element declaration at line 24.
[geometa][WARN] Element 'text': This element is not expected. Expected is ( {http://www.isotc211.org/2005/gmd}MD_ReferenceSystem ) at line 25.
[geometa][WARN] Element 'text': This element is not expected. Expected is ( {http://www.isotc211.org/2005/gmd}CI_Citation ) at line 41.
[geometa][WARN] Element '{http://www.isotc211.org/2005/gmd}status', attribute '{http://www.w3.org/2001/XMLSchema-instance}type': The type definition '{http://www.isotc211.org/2005/gmd}PT_FreeText_PropertyType', specified by xsi:type, is blocked or not validly derived from the type definition of the element declaration at line 84.
[geometa][WARN] Element 'text': This element is not expected. Expected is ( {http://www.isotc211.org/2005/gmd}MD_ProgressCode ) at line 85.
[geometa][WARN] Element '{http://www.isotc211.org/2005/gmd}resourceMaintenance', attribute '{http://www.w3.org/2001/XMLSchema-instance}type': The type definition '{http://www.isotc211.org/2005/gmd}PT_FreeText_PropertyType', specified by xsi:type, is blocked or not validly derived from the type definition of the element declaration at line 197.
[geometa][WARN] Element 'text': This element is not expected. Expected is ( {http://www.isotc211.org/2005/gmd}MD_MaintenanceInformation ) at line 198.
[geometa][WARN] Element '{http://www.isotc211.org/2005/gmd}spatialRepresentationType', attribute '{http://www.w3.org/2001/XMLSchema-instance}type': The type definition '{http://www.isotc211.org/2005/gmd}PT_FreeText_PropertyType', specified by xsi:type, is blocked or not validly derived from the type definition of the element declaration at line 330.
[geometa][WARN] Element 'text': This element is not expected. Expected is ( {http://www.isotc211.org/2005/gmd}MD_SpatialRepresentationTypeCode ) at line 331.
[geometa][WARN] Element '{http://www.isotc211.org/2005/gmd}spatialResolution', attribute '{http://www.w3.org/2001/XMLSchema-instance}type': The type definition '{http://www.isotc211.org/2005/gmd}PT_FreeText_PropertyType', specified by xsi:type, is blocked or not validly derived from the type definition of the element declaration at line 334.
[geometa][WARN] Element 'text': This element is not expected. Expected is ( {http://www.isotc211.org/2005/gmd}MD_Resolution ) at line 335.
[geometa][WARN] Element '{http://www.isotc211.org/2005/gmd}topicCategory', attribute '{http://www.w3.org/2001/XMLSchema-instance}type': The type definition '{http://www.isotc211.org/2005/gmd}PT_FreeText_PropertyType', specified by xsi:type, is blocked or not validly derived from the type definition of the element declaration at line 355.
[geometa][WARN] Element 'text': This element is not expected. Expected is ( {http://www.isotc211.org/2005/gmd}MD_TopicCategoryCode ) at line 356.
[geometa][WARN] Element '{http://www.isotc211.org/2005/gmd}extent', attribute '{http://www.w3.org/2001/XMLSchema-instance}type': The type definition '{http://www.isotc211.org/2005/gmd}PT_FreeText_PropertyType', specified by xsi:type, is blocked or not validly derived from the type definition of the element declaration at line 359.
[geometa][WARN] Element 'text': This element is not expected. Expected is ( {http://www.isotc211.org/2005/gmd}EX_Extent ) at line 360.
[geometa][WARN] Element 'text': This element is not expected. Expected is ( {http://www.isotc211.org/2005/gmd}MD_Distribution ) at line 387.
[geometa][WARN] Element '{http://www.isotc211.org/2005/gmd}lineage': This element is not expected. Expected is ( {http://www.isotc211.org/2005/gmd}scope ) at line 426.
Most of the errors here deal with an invalid XML (more than invalid ISO 19115), due to quote inserted in in the XML representation. Example:
<gmd:referenceSystemInfo>
"
<gmd:MD_ReferenceSystem>
"
<gmd:referenceSystemIdentifier>
"
<gmd:RS_Identifier>
<gmd:code>
<gco:CharacterString>WGS 84</gco:CharacterString>
</gmd:code>
</gmd:RS_Identifier>
"
</gmd:referenceSystemIdentifier>
"
</gmd:MD_ReferenceSystem>
"
</gmd:referenceSystemInfo>
I'm not sure if Geonetwork had been permissive and validated this kind of XML, if it is an issue of formatter, or a bad manual metadata editing in GN that led to get these quotes, but it's clearly not delivering a valid XML. These quotes should not be there, and geometa will logically interpret them as "text" xml elements, hence validity errors.
Ouhahouhou! THANK YOU so much! So intructive message! So we "just" have to consider the last part of the latest element mentionned! Really usefull ;) Additionnaly, when we have such output, this is "just" the first warning/error who is mentionned, not all !!! So if we fix it, and try to re-validate, we will see potentially others warning/Errors ! AMAZING !!! Thank you so much!
Ok, so there is serious issues there, I will try others documents to better see if this is "just" due to an isolate problem or something worse.... THANK YOU SO MUUUUUCCCCH Emmanuel !!!
Concerning
No really, ISO 19115-3 is backward compatible with ISO 19115-1 and 19115-2; but here the Geonetwork XML formatter seems to deliver you an ISO 19115-1 given the schemas that are referenced.
This means the geonetwork I am testing is doing that for all documents and that no one ISO document from this geonetwork will be validate or I am missing something ? (Sorry I try to evaluate if I can see potential "document oriented issues" vs "geonetwork X wide ones").
Of course ;) restart RStudio or Windows allows the command to be executed! Thank you so much!!!!
never had problems with pip install package -U
;)
@yvanlebras The formatter is fine to deliver you a ISO19115-1 . What is not normal and you should investigate is the issue of quotes. You may have a look to various metadata (maybe in various catalogues) to see if the XML formatter delivers always these quoted XML tags, or if this is specific to some of your metadata (and maybe the result of a human error). I don't know. In case you think there is a systemic issue with the Geonetwork XML formatter, you should liaise with GN team and open a ticket (if it's not open already).
Hello all, Except this quote issue (maybe due to deliver an empty string), what about the lack of the "contact" tag as it is compulsory (no minOccurs and in a sequence) ? If the data is missing in the source, does GN generate an invalid xml ?
You may want to check the GN parameters, under Attributs nil reason; if this can be activated (never tried). I suggest you contact GN support for this. geometa is primarily designed to produce XML, and allows to set all fields/tags to make the XML valid, even if there are missing in content (with nil reason = missing). In that vase we set a NA.
The geometa action set-up in geoflow (see https://github.com/eblondel/geoflow/blob/master/inst/actions/geometa_create_iso_19115.R) gives a turn-key function to produce a complete valid ISO 19139 XML metadata (and then set properly NAs where needed).
Hi Emmanuel!
Hope all is ok on your side! I am testing geometa as we want to propose a Galaxy tool to validate ISO 19115 xml documents and propose metadata translations. For now, @Marie59 created a conda recipe for geometa so it will be easier to install and use it in any OS and for any kind of environment! Newt step is to look at create a Galaxy tool using this recipe. So, I finally take some times to make some tests before proposing a relevant Galaxy tool !
I was testing on several geonetwork ISO files, and notably a Wilfried one, I think from here http://147.100.164.43:8080/geonetwork/srv/api/records/pyrenees
Here is my code:
Here is the result:
Not sure I am making things the good way and I am a geometa newbie, so maybe this is ""normal"" that ISO compliance is not ok... Or maybe I made something wrong....
Don't hesitate to say! ;)
Have a nice week, Yvan