activityworkshop / GpsPrune

GpsPrune is a map-based application for viewing, editing and converting coordinate data from GPS systems.
GNU General Public License v2.0
72 stars 21 forks source link

Cannot load gpx file from OSm route manager (due to blank lines) #38

Closed sebastic closed 3 years ago

sebastic commented 3 years ago

As reported in Debian Bug #990641:

If I generate a gpx file with 'OSM Route manager', specifically the Ceredigion Coast path https://osmrm.openstreetmap.de/relation.jsp?id=1806040

when I try to load it with GPSprune it says: "Error reading file: The processing instruction target matching '[xX][mM][lL]' is not allowed."

This is quite a cryptic error.

If try 'Import file with GPSbabel' instead I get the somewhat more helpful "GPX Read error: 'XML declaration not at start of document. "File: /tmp/Ceredigion+Coast+Path.gpx" Line: 13 column: 55'

Looking at the file (attached) it has 12 leading blank lines. (0x0A unix linefeeds). Looks like GPSprune is expecting the xml header to be on the first line.

If you remove those 12 lines then the file loads as expected.

Now I presume that GPSprune is following the gpx spec, but perhaps it could be forgiving of leading whitespace? Certainly it could give a more useful error message of the form 'Could not load file : invalid GPX file'.

I had a look at the gpx spec, and the schema is here: https://www.topografix.com/GPX/1/1/gpx.xsd but I guess that whether the xml line must be on the very first line is actually part of the XML spec. I failed to find a simple validator I could run to check whether

The codebase for OSMRM is here: https://github.com/osmrmhv/osmrmhv/issues so it you are happy that GPSprune is correct to reject this file then I guess I should file an issue there instead.

wookey commented 3 years ago

I forgot to include the offending file in the original bug report (adding blank lines to the start of a file or, clicking the 'download gpx' button on the link given is not too hard, but of course what's in OSM will change over time. I don't know if all gpx files that tool generates have this issue, or perhaps only long ones or ones with multiple segments). Anyway here's the test case for future reference (I had to gzip it as this tool won't accept bare .gpx files) Ceredigion+Coast+Path.gpx.gz

activityworkshop commented 3 years ago

Ceredigion? Ardderchog, diolch yn fawr am y cwestiwn!

Thanks for reporting this, I've not seen this error before and I agree with you it seems like this simple whitespace problem should be ignorable somehow. Unfortunately (long story short) I don't think there's much that GpsPrune can do to fix it, but I'll try to explain why.

Why can't it be loaded?

You're completely correct, the reason it can't be loaded is because of the empty lines before the start. According to the xml spec, if the xml file has a "prolog" (the bit with <?xml ), then this must come first. So it's nothing to do with the gpx spec, it's whether it's well-formed xml or not. And again, I agree with you, maybe the spec should be more tolerant of such additional whitespace but for some presumably valid technical reason this was so specified many years ago. See for example this thread from 9 years ago: https://stackoverflow.com/questions/7939426/

Why can't GpsPrune ignore the whitespace?

Xml parsing is a very general job, needed by lots of different programs, and it's also difficult. So each program doesn't try to do the xml parsing itself, it uses a general library. In GpsPrune's case, it uses the java parsing libraries. (It normally uses java's "SAX" parser, or it can also use the separate "Xerces" library if it can find it, but both fail with your xml). So GpsPrune can't decide whether to ignore the whitespace or not, it's a decision made by the separate parsing library.

When GpsPrune imports a file using GPSBabel, the parser used is whatever GPSBabel uses, which isn't the java one. And that one also fails, although with a different message.

Why is the error message so cryptic?

When GpsPrune asks SAX to read the xml, it throws an exception (an org.xml.sax.SAXParseException) and this contains the error message as described by SAX. GpsPrune is told that the parsing failed, and shows the error message which SAX provided, so if it's cryptic, there's not much that GpsPrune can do apart from just show it. In this case the reason it's talking about processing instructions is that the parser thinks that this tag can't be the prolog because it's not at the beginning, so it must be a processing instruction instead.

Xerces produces a different message inside the exception that it throws, and GPSBabel's parser as you have seen returns a slightly more helpful one, but GpsPrune is just passing on what it has been told in each case.

What do the validators say?

If I search for "online xml validator" I find a bunch of available tools, and each one I tried gave errors, just expressed differently (eg "Error : InvalidXml, Line : 13, Message : XML declaration allowed only at the start of the document."). I tried with JOSM, Viking and gpxviewer, and of course GPSBabel, but nothing was able to load this file. So I think it's safe to say that the xml is really not valid, and the error lies with this "Route Manager" exporter. I've not seen this export option before so can't say whether this feature is new or not.

Other options

I was really sure that it was possible to export a gpx or a geojson file from the "OSM Relation Analyzer", but I can't find this option now. If that's gone then the only way I know to export a relation is using JOSM, but I agree that an export directly from a webpage would be more convenient.

Unfortunately I can't see anything that GpsPrune can do here, so the best bet is to report the problem to the owners of the "Route Manager" and see if they want to fix their export.

activityworkshop commented 3 years ago

Closed due to lack of response since July.