Turbo87 / aerofiles

waypoint, task, tracklog readers and writers for aviation
http://aerofiles.readthedocs.org/
MIT License
45 stars 27 forks source link

problem with parsing igc file #31

Closed GliderGeek closed 5 years ago

GliderGeek commented 6 years ago

The following igc file (renamed to txt for github upload), does not parse.

75V_GPS.txt

GliderGeek commented 6 years ago

Turns out the problem is not in the aerofiles library, but was caused by wrongly decoding with ascii. For documentation purposes and future reference:

The file contains the following line with an accent in the name: LCU::HPPLTPILOT:René de Dreu

This file should be read with the following code:

with open('75_GPS.igc', 'r', encoding='utf-8') as f:
    result = Reader().read(f)

The encoding argument defaults on ascii, not supporting the letter é.

Turbo87 commented 6 years ago

well... the IGC file standard defines that only ASCII is legal but it seems that Naviter doesn't care about that...

GliderGeek commented 6 years ago

same as with the line length ;)

Turbo87 commented 6 years ago

yep, although in this case there might be different interpretations. I'm pretty sure I've also already seen IGC files where latin1 was used, so it's not that easy to just assume UTF8 🤔

GliderGeek commented 6 years ago

i wrongly assumed that latin1 is a subset of utf8 (which it isn't, ascii is a subset of both latin1 and utf-8).

any thoughts on how to detect this? i guess knowing the source of the igc file might make it predictable (files form soaringspot always have encoding x)

Turbo87 commented 6 years ago

any thoughts on how to detect this?

it's complicated... there is no reliable way to detect this. the best-effort variant it reading as utf8 and if that fails trying with latin1 instead, which is what we do in XCSoar too iirc.

see also https://stackoverflow.com/a/22868803/1478093