GPSBabel / gpsbabel

GPSBabel: convert, manipulate, and transfer data from GPS programs or GPS receivers. Open Source and supported on MacOS, Windows, Linux, and more. Pointy clicky GUI or a command line version...
https://www.gpsbabel.org
GNU General Public License v2.0
475 stars 126 forks source link

If a TCX file started with whitespaces, gpsbabel fails to recognize it as correct XML #371

Closed Djailla closed 5 years ago

Djailla commented 5 years ago

I try to play with Strava exported files ( in .tcx.gz) format.

First of all, gpsbabel does not support gz files as input.

So I did extract the file and called

gpsbabel -i gtrnctr -f 12380976.tcx -o gpx -F test.gpx

And if fails with this error :

XML Reader:Read error: XML declaration not at start of document. (12380976.tcx, line 1, col 65)

When opening the file, there are few white spaces. If I remove them then it works.

Could the XML parser be more cool with this white spaces ?

Regards 12380976.tcx.gz

robertlipe commented 5 years ago

Hi, and welcome.

Sorry that it's taken me some time to feel well enough and have time on my hands to look into this, but I don't think I'm going to make you happy. (At least not at first.) I'd been thinking about this a while and did do some substantial research on this topic; I'm not dismissing this out of hand and without thought.

I can't find any evidence that it's legal for a valid xml file of any time to start with whitespace before the declaration. It's pretty fundamental (for reasons of BOM markers and such) that this is one space that whitespace matters. If you copy

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>

into either a local validator or an online service like https://www.xmlvalidation.com/, every validator I found rejected it.

"fix" file

$ file /tmp/12380976.tcx /tmp/12380976.tcx: XML 1.0 document text, ASCII text, with very long lines

put white space back into /tmp/12380976.tcx

$ file /tmp/12380976.tcx /tmp/12380976.tcx: ASCII text, with very long lines

If you wanted to hack up GPSBabel to read these "not really XML" files, you might be able to special case gpsbabel::File to eat that whitespace at the beginning or subclass QXmlStreamReader to make a QAlmostXmlStreamReader or something. It's not really trivial.

While I'm generally sympathetic to Postel Principle, our TCX (indeed, all our readers) really are oriented to what devices create (that's within relevant specification), I think that as long as Strava's making a new variation of files that won't even validate, that's on them to fix.

I think it's in Strava's court to fix their XML writer.

If you need help convincing Strava they're wrong, feel free to cite this.

RJL

On Tue, Jun 25, 2019 at 2:59 AM Bastien Vallet notifications@github.com wrote:

I try to play with Strava exported files ( in .tcx.gz) format.

First of all, gpsbabel does not support gz files as input.

So I did extract the file and called

gpsbabel -i gtrnctr -f 12380976.tcx -o gpx -F test.gpx

And if fails with this error :

XML Reader:Read error: XML declaration not at start of document. (12380976.tcx, line 1, col 65)

When opening the file, there are few white spaces. If I remove them then it works.

Could the XML parser be more cool with this white spaces ?

Regards 12380976.tcx.gz https://github.com/gpsbabel/gpsbabel/files/3324153/12380976.tcx.gz

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/gpsbabel/gpsbabel/issues/371?email_source=notifications&email_token=ACCSD3YQAXUBIM2YZDBSVETP4HF7ZA5CNFSM4H3FQRJ2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G3PFUXA, or mute the thread https://github.com/notifications/unsubscribe-auth/ACCSD37EWRTSYUKWVHZRR4TP4HF7ZANCNFSM4H3FQRJQ .

tsteven4 commented 5 years ago

I agree with Robert, the xml declaration, if present, must be the first thing in the document.

See in https://docstore.mik.ua/orelly/xml/xmlnut/ch02_09.htm

Also see https://www.w3.org/TR/xml/#sec-documents, 2.1 and 2.8.

On 7/7/2019 10:38 PM, Robert Lipe wrote:

Hi, and welcome.

Sorry that it's taken me some time to feel well enough and have time on my hands to look into this, but I don't think I'm going to make you happy. (At least not at first.) I'd been thinking about this a while and did do some substantial research on this topic; I'm not dismissing this out of hand and without thought.

I can't find any evidence that it's legal for a valid xml file of any time to start with whitespace before the declaration. It's pretty fundamental (for reasons of BOM markers and such) that this is one space that whitespace matters. If you copy

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>

into either a local validator or an online service like https://www.xmlvalidation.com/, every validator I found rejected it.

"fix" file

$ file /tmp/12380976.tcx /tmp/12380976.tcx: XML 1.0 document text, ASCII text, with very long lines

put white space back into /tmp/12380976.tcx

$ file /tmp/12380976.tcx /tmp/12380976.tcx: ASCII text, with very long lines

If you wanted to hack up GPSBabel to read these "not really XML" files, you might be able to special case gpsbabel::File to eat that whitespace at the beginning or subclass QXmlStreamReader to make a QAlmostXmlStreamReader or something. It's not really trivial.

While I'm generally sympathetic to Postel Principle, our TCX (indeed, all our readers) really are oriented to what devices create (that's within relevant specification), I think that as long as Strava's making a new variation of files that won't even validate, that's on them to fix.

I think it's in Strava's court to fix their XML writer.

If you need help convincing Strava they're wrong, feel free to cite this.

RJL

On Tue, Jun 25, 2019 at 2:59 AM Bastien Vallet notifications@github.com wrote:

I try to play with Strava exported files ( in .tcx.gz) format.

First of all, gpsbabel does not support gz files as input.

So I did extract the file and called

gpsbabel -i gtrnctr -f 12380976.tcx -o gpx -F test.gpx

And if fails with this error :

XML Reader:Read error: XML declaration not at start of document. (12380976.tcx, line 1, col 65)

When opening the file, there are few white spaces. If I remove them then it works.

Could the XML parser be more cool with this white spaces ?

Regards 12380976.tcx.gz https://github.com/gpsbabel/gpsbabel/files/3324153/12380976.tcx.gz

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub

https://github.com/gpsbabel/gpsbabel/issues/371?email_source=notifications&email_token=ACCSD3YQAXUBIM2YZDBSVETP4HF7ZA5CNFSM4H3FQRJ2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G3PFUXA, or mute the thread

https://github.com/notifications/unsubscribe-auth/ACCSD37EWRTSYUKWVHZRR4TP4HF7ZANCNFSM4H3FQRJQ .

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/gpsbabel/gpsbabel/issues/371?email_source=notifications&email_token=ADHXMMOABM2VM6U4WVJJ2VDP6LADPA5CNFSM4H3FQRJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZL6DQY#issuecomment-509075907, or mute the thread https://github.com/notifications/unsubscribe-auth/ADHXMMKGRBUONUHKASX2MS3P6LADPANCNFSM4H3FQRJQ.

Djailla commented 5 years ago

Agreed for the XML format, what about support for compressed files ?

robertlipe commented 5 years ago

What device writes .gz on the device itself?

Compression is already a pain for us. We really don't need to be chasing winrar, 7zip, gzip, multi-file compression and other such. While we do have some that we decomrpess out of necessity, I'd rather not chase tails of anything the OS finds convenient.

Yes, I know we have a copy of zlib included. It'd be problematic for us in things like dragging and dropping a file...

Sorry, but that's probably still left best outside us.

On Tue, Jul 9, 2019 at 9:46 AM Bastien Vallet notifications@github.com wrote:

Agreed for the XML format, what about support for compressed files ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/gpsbabel/gpsbabel/issues/371?email_source=notifications&email_token=ACCSD347J2DG2I2UBVB7DMDP6SQDBA5CNFSM4H3FQRJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZQP3VI#issuecomment-509672917, or mute the thread https://github.com/notifications/unsubscribe-auth/ACCSD37OUC3CDIB7JA24NL3P6SQDBANCNFSM4H3FQRJQ .

Djailla commented 5 years ago

Agreed

Nathanllee1 commented 3 years ago

I used this one liner to correct all my strava exported files for f in *.tcx; do tail -c +11 $f > "edited_$f"; done There were 11 spaces in front of the declaration, so this just grabs everything except those and outputs it to a file.

gkatsanos commented 11 months ago

just discovered this thread while encoutering the same issue. I got a list of .tcx files from Strava which I want to convert to .gpx so I can import them in Garmin Connect. Trying to convert them with GPSBabel throws the same error. I used a bash script and removed the whitespace, GPSBabel is able to convert them. So far, so good. But the resulting .GPX format isn't accepted by Garmin.

... Meanwhile : https://www.gpsvisualizer.com/ is able to both directly convert TCX to GPX (ignoring whitespace issues) and the resulting file is acceptable by Garmin.

Which leads me to believe that standards aside, market consumers of these files somehow expect a different structure / whitespace / something.

If you have any idea or flag that GPSBabel should set to succesfully convert, let me know (I have a big list of files to convert so I prefer doing it with the GPSBabel CLI instead of using .. a web gui ) (even though I might just have to..)

tsteven4 commented 11 months ago

@gkatsanos please provide a test case. The problem with the initial spaces in the tcx file is on Strava, but it is unclear why the gpsbabel written gpx file isn't accepted by Garmin. If you can provide a test case, including an edited tcx file and the resulting gpx file we can verify if the gpx is valid or not.

gkatsanos commented 11 months ago

The content is too long to paste it, I did do some trial an error (using the export of the other tool) and it seems replacing

<?xml version="1.0" encoding="UTF-8"?>
<gpx version="1.0" creator="GPSBabel - https://www.gpsbabel.org" xmlns="http://www.topografix.com/GPX/1/0">

at the top, with:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<gpx version="1.1" creator="GPS Visualizer https://www.gpsvisualizer.com/" xmlns="http://www.topografix.com/GPX/1/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd">

seems to make it work. I hope that helps.

tsteven4 commented 11 months ago

@gkatsanos also please clarify what "isn't accepted by Garmin" means. After fixing the initial spaces and converting gpsbabel -i gtrnctr -f 12380976.tcx -o gpx -F 12380976.gpx the resulting gpx file is accepted to create a course on garmin connect.

tsteven4 commented 11 months ago

you can force the gpx output version to 1.1 by -o gpx,gpxver=1.1

gkatsanos commented 11 months ago

That did the trick.

image

thank you !

( I meant Garmin importer throws error when uploading the .gpx file ) (not anymore)