SEMICeu / iso-19139-to-dcat-ap

Reference XSLT-based implementation of GeoDCAT-AP
European Union Public License 1.2
15 stars 9 forks source link

Running the iso-19139-to-dcat-ap XSLT transformation in Notepad++ #29

Open jescriu opened 3 years ago

jescriu commented 3 years ago

Dear colleagues,

I tried to run iso-19139-to-dcat-ap XSLT via Python (as proposed in this GitHub page), but I got an error message when parsing the input .xml metadata file from a GetRecordById request to my catalogue (https://www.ide.cat/servei/catalunya/cataleg-idec/csw?request=GetRecordById&service=CSW&version=2.0.2&outputSchema=http://www.isotc211.org/2005/gmd&ElementSetName=full&ID=inspire-adreces).

After experiencing this issue, I directly ran the XSLT transformation to my ISO .xml metadata files in Notepad++, using the XML Tools plugin (Plugins > XML Tools > XSL Transformation) - However, I am not sure if this is an appropriate way to run it, and if the .rdf DCAT metadata files obtained in this way are correct.

Anyway, I think running the XSLT script in Notepad++ could be a good idea to spread its use across non-developer users.

Happy to get your feedback.

All the best,

Jordi

andrea-perego commented 3 years ago

Thanks for reporting this issue, @jescriu .

I've tried to run the transformation via the GeoDCAT-AP API demo, which uses the PHP implementation, and it works:

http://geodcat-ap.semic.eu/api/?outputSchema=extended&src=https%3A%2F%2Fwww.ide.cat%2Fservei%2Fcatalunya%2Fcataleg-idec%2Fcsw%3Frequest%3DGetRecordById%26service%3DCSW%26version%3D2.0.2%26outputSchema%3Dhttp%3A%2F%2Fwww.isotc211.org%2F2005%2Fgmd%26ElementSetName%3Dfull%26ID%3Dinspire-adreces&outputFormat=text%2Fhtml

It seems that the problem with the proposed Python script is that the etree.parse function does not support HTTPS.

A possible fix:

import lxml.etree as ET
from urllib2 import urlopen

# The URL of the XML document to be transformed. Here it corresponds to a "GetRecords" output of a fictitious CSW, with the "maxRecords" parameter set to 10.
xmlURL = "http://some.site/csw?request=GetRecords&service=CSW&version=2.0.2&namespace=xmlns%28csw=http://www.opengis.net/cat/csw%29&resultType=results&outputSchema=http://www.isotc211.org/2005/gmd&outputFormat=application/xml&typeNames=csw:Record&elementSetName=full&constraintLanguage=CQL_TEXT&constraint_language_version=1.1.0&maxRecords=10"

# The URL pointing to the latest version of the XSLT.
xslURL = "https://raw.githubusercontent.com/SEMICeu/iso-19139-to-dcat-ap/master/iso-19139-to-dcat-ap.xsl"

xml = ET.parse(urlopen(xmlURL))
xsl = ET.parse(urlopen(xslURL))

transform = ET.XSLT(xsl)

print(ET.tostring(transform(xml), pretty_print=True))

Does this work?

arbakker commented 3 years ago

I wrote a Python script using urllib.request from the Python standard library solving this issue. The script accepts urls and file paths as arguments.

andrea-perego commented 3 years ago

Thanks for contributing the script, @arbakker . I included a link to it in the "How To" page (see commit https://github.com/SEMICeu/iso-19139-to-dcat-ap/commit/41026c285e688f551e92f3b6063a4c7d3bce8997).

andrea-perego commented 3 years ago

@jescriu , I updated the Python script as illustrated in https://github.com/SEMICeu/iso-19139-to-dcat-ap/issues/29#issuecomment-838545291 (see https://github.com/SEMICeu/iso-19139-to-dcat-ap/commit/41026c285e688f551e92f3b6063a4c7d3bce8997).

Is this fix addressing your issue?

About your other question:

The XSLT should always return a correct RDF file, irrespective of the tool you're using.

Other options to test it are the GeoDCAT-AP API I mentioned earlier in this thread, or the command line tool above kindly contributed by @arbakker .