RMLio / yarrrml-parser

A YARRRML parser library and CLI in Javascript
MIT License
41 stars 17 forks source link

How to refer to default namespace in XML sources? #158

Open rorlic opened 2 years ago

rorlic commented 2 years ago

Issue type: :question: Question

I am testing Matey and the yarrrml-parser using an XML source. This is an atom feed looking like this:

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:georss="http://www.georss.org/georss">
    <title>USGS Magnitude 2.5+ Earthquakes, Past Hour</title>
    <updated>2022-04-08T13:42:05Z</updated>
    <author>
        <name>U.S. Geological Survey</name>
        <uri>https://earthquake.usgs.gov/</uri>
    </author>
    <id>https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_hour.atom</id>
    <link rel="self" href="https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_hour.atom"/>
    <icon>https://earthquake.usgs.gov/favicon.ico</icon>
    <entry>
        <id>urn:earthquake-usgs-gov:ak:0224iah979</id>
        <title>M 2.5 - 49 km NNW of Koyukuk, Alaska</title>
        <updated>2022-04-08T13:22:18.805Z</updated>
        <link rel="alternate" type="text/html" href="https://earthquake.usgs.gov/earthquakes/eventpage/ak0224iah979"/>
        <summary type="html"><![CDATA[<dl><dt>Time</dt><dd>2022-04-08 13:12:17 UTC</dd><dd>2022-04-08 13:12:17 UTC at epicenter</dd><dt>Location</dt><dd>65.253&deg;N 158.271&deg;W</dd><dt>Depth</dt><dd>7.20 km (4.47 mi)</dd></dl>]]></summary>
        <georss:point>65.2534 -158.2711</georss:point>
        <georss:elev>-7200</georss:elev>
        <category label="Age" term="Past Hour"/>
        <category label="Magnitude" term="Magnitude 2"/>
        <category label="Contributor" term="ak"/>
        <category label="Author" term="ak"/>
    </entry>
</feed>

The yarrrml file looks like this:

prefixes:
  ex: http://earthquake.usgs.gov/
  dc: http://purl.org/dc/terms/

mappings:
  earthquake:
    sources:
      - [earthquake.xml~xpath, /feed/entry]
    s: ex:id/earthquake/$(./id)
    po:
      - [a, ex:earthquake]
      - [dc:title, $(./title)]

Note that the atom feed contains a default XML namespace: <feed xmlns="http://www.w3.org/2005/Atom". I have specified the iterator as /feed/entry but this results in no triples being produced as the xpath probably does not find any xml elements.

Is there a way to tell the yarrrml and the RML mapper to use the default XML namespace when resolving the xpath? If so, how do I specify the xpaths namespace in the iterator (e.g. /ns:feed/ns:entry) and in other places?

BTW, specifying an XML namespace in the subject mapping produces an incorrect RML file, e.g. using s: ex:id/earthquake/$(./ns:id) produces:

map:s_000 rdf:type rr:SubjectMap ;
  rr:template "http://earthquake.usgs.gov/id/earthquake/{./ns" .

I will log this as a separate bug.

pheyvaer commented 2 years ago

Hi @rorlic

In YARRRML you cannot set the default XML namespace and it's also not supported in RML at the moment. There is an open issue about this at https://github.com/kg-construct/rml-target-source-spec/issues/4