Open labra opened 4 years ago
I was playing with timbl's sed program. From this gpx example (taken from Wikipedia):
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<gpx xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1" creator="Oregon 400t" version="1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://www.garmin.com/xmlschemas/GpxExtensions/v3 http://www.garmin.com/xmlschemas/GpxExtensionsv3.xsd http://www.garmin.com/xmlschemas/TrackPointExtension/v1 http://www.garmin.com/xmlschemas/TrackPointExtensionv1.xsd">
<metadata>
<link href="http://www.garmin.com">
<text>Garmin International</text>
</link>
<time>2009-10-17T22:58:43Z</time>
</metadata>
<trk>
<name>Example GPX Document</name>
<trkseg>
<trkpt lat="47.644548" lon="-122.326897">
<ele>4.46</ele>
<time>2009-10-17T18:37:26Z</time>
</trkpt>
<trkpt lat="47.644548" lon="-122.326897">
<ele>4.94</ele>
<time>2009-10-17T18:37:31Z</time>
</trkpt>
<trkpt lat="47.644548" lon="-122.326897">
<ele>6.87</ele>
<time>2009-10-17T18:37:34Z</time>
</trkpt>
</trkseg>
</trk>
</gpx>
it generates this Turtle file:
@prefix gps: <http://hackdiary.com/ns/gps#> .
@prefix wgs84: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
<#ThisRecord> a gps:Record;
wgs84:time "2009-10-17T22:58:43Z";
gps:track [
gps:trackpoint [wgs84:lat 47.644548e+00; wgs84:long -122.326897e+00;
wgs84:altitude 4.46e0;
wgs84:time "2009-10-17T18:37:26Z";
];
gps:trackpoint [wgs84:lat 47.644548e+00; wgs84:long -122.326897e+00;
wgs84:altitude 4.94e0;
wgs84:time "2009-10-17T18:37:31Z";
];
gps:trackpoint [wgs84:lat 47.644548e+00; wgs84:long -122.326897e+00;
wgs84:altitude 6.87e0;
wgs84:time "2009-10-17T18:37:34Z";
];
];
.
If you want to try timbl's converter with other files, you need sed and you can run it with
sed -f gpx2n3.sed file.gpx
One problem is that it is only converting track points and ignores other GPX contents. For example, if you try it with another example that contains only way points, it just generates the following:
@prefix gps: <http://hackdiary.com/ns/gps#> .
@prefix wgs84: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
<#ThisRecord> a gps:Record .
Someone seems to already been working on the sed script, but I couldn't avoid to look into it. I've changed it a bit so it also processes waypoints, and on the way noticed several circumstances that look relevant for our purposes.
First, the new script:
1i\
@prefix gps: <http://hackdiary.com/ns/gps#> . \
@prefix wgs84: <http://www.w3.org/2003/01/geo/wgs84_pos#> .\
\
<#ThisRecord> a gps:Record;\
s/<trkpt lat="\([0-9\.-]*\)" lon="\([0-9\.-]*\)">/gps:trackpoint [wgs84:lat \1e+00; wgs84:long \2e+00;/
s?</trkpt>?];?
s/<wpt lat="\([0-9\.-]*\)" lon="\([0-9\.-]*\)">/gps:trackpoint [wgs84:lat \1e+00; wgs84:long \2e+00;/
s?<extensions>.*</extensions>??
s?</wpt>?];?
s?<ele>\([0-9\.-]*\)</ele>? wgs84:altitude \1e0;?
s?<time>\([_A-Z:0-9\.-]*\)</time>? wgs84:time "\1";?
/<speed>.*<.speed>/d
/<course>.*<.course>/d
s?<trkseg>??
s?</trkseg>??
s?<trk>? gps:track [?
s?</trk>?];?
#/<gpx/,/</-1d
s?</gpx>?.?
/<wpt/,/<\/wpt/d
/<?xml/d
I made it so extensions are deleted and waypoints are processed similarly to trackpoints in the original version. Also, I removed the creation of temporary octothorpes as markers for new lines, since this only worked of a specific line distribution and identation.
The original looked like designed for a specific formatting that would probably not work if the line distribution was different: it deletes all lines containing speed or course information, for example. My version does not solve this perfectly, either. Lines are very important to sed because of its nature of Stream EDitor.
Also, the closing is substituted by a line containing a dot, probably for processing in Unix streams, so we might not want that either.
Identation of the final text is probably not important for us if it will just be part of the processing of source .gpx files.
Maybe too much information for a specific topic, but I would like to see discussion about whether to stick to sed to process input .gpx files, which would be trivial to set up on the host machines, or to start working on another approach.
@InigoGutierrez I've been working with timbl's sed converter, and I found your new script very interesting. Nevertheless, I don't know if this is the best option we can follow to get the maximum information.
To show you an example, as I explained in this issue, I have taken a GPX file downloaded from Strava, which allows their users to get the routes they make very easily.
The GPX file (shortened) is this one:
<?xml version="1.0" encoding="UTF-8"?>
<gpx creator="StravaGPX Android" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd" version="1.1" xmlns="http://www.topografix.com/GPX/1/1">
<metadata>
<time>2019-05-11T18:16:59Z</time>
</metadata>
<trk>
<name>Atletismo al anochecer</name>
<type>9</type>
<trkseg>
<trkpt lat="43.3614310" lon="-5.8551880">
<ele>255.5</ele>
<time>2019-05-11T18:16:59Z</time>
</trkpt>
<trkpt lat="43.3614120" lon="-5.8551590">
<ele>255.5</ele>
<time>2019-05-11T18:17:13Z</time>
</trkpt>
<trkpt lat="43.3613890" lon="-5.8551320">
<ele>255.5</ele>
<time>2019-05-11T18:17:15Z</time>
</trkpt>
</trkseg>
</trk>
</gpx>
With timbl's gpx2n3.sed
converter we can obtain this Turtle file:
@prefix gps: <http://hackdiary.com/ns/gps#> .
@prefix wgs84: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
<#ThisRecord> a gps:Record;
wgs84:time "2019-05-11T18:16:59Z";
gps:track [
gps:trackpoint [wgs84:lat 43.3614310e+00; wgs84:long -5.8551880e+00;
wgs84:altitude 255.5e0;
wgs84:time "2019-05-11T18:16:59Z";
];
gps:trackpoint [wgs84:lat 43.3614120e+00; wgs84:long -5.8551590e+00;
wgs84:altitude 255.5e0;
wgs84:time "2019-05-11T18:17:13Z";
];
gps:trackpoint [wgs84:lat 43.3613890e+00; wgs84:long -5.8551320e+00;
wgs84:altitude 255.5e0;
wgs84:time "2019-05-11T18:17:15Z";
];
];
.
Now, with your new sed converter:
@prefix gps: <http://hackdiary.com/ns/gps#> .
@prefix wgs84: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
<#ThisRecord> a gps:Record;
<gpx creator="StravaGPX Android" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd" version="1.1" xmlns="http://www.topografix.com/GPX/1/1">
<metadata>
wgs84:time "2019-05-11T18:16:59Z";
</metadata>
gps:track [
<name>Atletismo al anochecer</name>
<type>9</type>
gps:trackpoint [wgs84:lat 43.3614310e+00; wgs84:long -5.8551880e+00;
wgs84:altitude 255.5e0;
wgs84:time "2019-05-11T18:16:59Z";
];
gps:trackpoint [wgs84:lat 43.3614120e+00; wgs84:long -5.8551590e+00;
wgs84:altitude 255.5e0;
wgs84:time "2019-05-11T18:17:13Z";
];
gps:trackpoint [wgs84:lat 43.3613890e+00; wgs84:long -5.8551320e+00;
wgs84:altitude 255.5e0;
wgs84:time "2019-05-11T18:17:15Z";
];
];
.
My point is, with your proposed converter I obtain more information like the route's name, but I think there are more problems, especially regarding that <gpx creator="StravaGPX Android" xmlns:xsi="http://...
line. It would be very helpful if you could explain why did you decided to include that line. Using timbl's converter, we can get all necessary data like trackpoints with no problem, but it also has some drawbacks.
Also, I think this new idea presented by Labra is actually a very good option, using a ShEx schema instead.
There are several approaches to convert XML to RDF.
For the third approach, I have created a GPX2RDF converter using XSLT. It is available at viadeSpec/converters/gpx2rdf.xslt.
The XSLT transformation could also be embedded and invoked from a high-level programming language using some XSLT libraries that are available or it can be invoked
If you want to try it on the command line, you need an XSLT processor. Using xsltproc, for example, you can convert it as:
xsltproc gpx2ttl.xslt file.gpx > converted.rdf
The result is in RDF/XML format which can easily be converted to Turtle or JSON-LD using RDFShape.
My point is, with your proposed converter I obtain more information like the route's name, but I think there are more problems, especially regarding that
<gpx creator="StravaGPX Android" xmlns:xsi="http://...
line. It would be very helpful if you could explain why did you decided to include that line. Using timbl's converter, we can get all necessary data like trackpoints with no problem, but it also has some drawbacks.
What we need is a tool that parses an input file and generates another, instead of making arbitrary changes in the original text. These scripts take the second approach, and that's why some original lines are kept and a lot of minutia factors must be taken into account. XSLT looks better.
Also, I think this new idea presented by Labra is actually a very good option, using a ShEx schema instead.
Don't we yet need a way of converting them? A ShEx Schema is just the specification of the target file we need to generate from the source GPX file.
Either way, the actual conversion mechanism is not necessary to be defined by the specification.
Don't we yet need a way of converting them?
Yes, I didn't mean that we can use ShEx to do it.
As you say, the conversion mechanism is not necessary for the specification. Anyway, I think the new approach that Labra has mentioned, of using his new GPX2RDF converter, is very promising.
@timbl has suggested to include support for GPX as it is a popular format supported by lots of devices. In order to do that there are several possibilities to consider:
Create an RDF-based representation of the GPX format. I think the latest (1.1) specification is defined here. One possibility which doesn't seem difficult is to represent that spec using ShEx.
Another possibility could be to translate a subset of the GPX format to Turtle. That's the approach followd by this sed program.
Finally, another possibility could be to include the raw GPX file as the value in a Turtle file that represents the routes and increasingly add more metadata to it. The solution could be to start with something like: