GeosoftInc / gxpy

Python for Geosoft GX Developer
https://geosoftgxdev.atlassian.net/wiki/spaces/GD/overview
BSD 2-Clause "Simplified" License
38 stars 30 forks source link

GDB files containing latin-1 characters cannot be opened #136

Open mplough-kobold opened 2 months ago

mplough-kobold commented 2 months ago

We often receive GDB files from contractors that contain data descriptions such as µT, but the descriptions are encoded using latin1. As a result, the µ character is encoded as 0xB5 rather than 0xC2 0xB5 as it would be in UTF-8.

Since Python strings are Unicode and gxpy loads strings in the default fashion, we end up with errors like this:

'utf-8' codec can't decode byte 0xb5 in position 0: invalid start byte

Handling other character encodings (or perhaps only latin1 if that's what GDB uses) would allow us to open these files with gxpy.

RichardScottOZ commented 1 week ago

Yes, I have seen this error recently too.

serban-seeq commented 6 days ago

@mplough-kobold Can you share the sample gdb with the ISO/IEC 8859-1 encoding?

mplough-kobold commented 5 days ago

@serban-seeq From SIGÉOM, see https://gq.mines.gouv.qc.ca/documents/EXAMINE/GM67278/. In GM67278_1_CD1.ZIP there exists a file called Deborah_Lake.gdb. The file is too large to upload here but the source is publicly available.