JuliaData / DBFTables.jl

Read and write DBF (dBase) tabular data in Julia
Other
10 stars 11 forks source link

Use correct string encodings #19

Open rafaqz opened 1 year ago

rafaqz commented 1 year ago

dbase files specify the string encoding, and occasionally it's not ascii. We should probably read that and use something like: https://github.com/JuliaStrings/StringEncodings.jl

To convert to UTF8.

See: https://github.com/JuliaGeo/Shapefile.jl/issues/63

ErickChacon commented 1 year ago

I have a dbf file that include the letter ñ (common in spanish). It is read adequately with ArchGDAL.jl, but not with Shapefile.jl which uses DBFTables.jl.

rafaqz commented 1 year ago

Yes that's expected currently, someone needs to implement what I suggested above.

Try a PR if you like.

rafaqz commented 1 year ago

A good approach would be to connect StringEncodings.jl to the dbf byte codes or "Language driver id" in the table here: http://www.dbase.com/Knowledgebase/INT/db7_file_fmt.htm

There is already a field for this in the header, we just don't use it https://github.com/JuliaData/DBFTables.jl/blob/7ea8090fb95419a48cd34c5523a5c57f6e88ff24/src/DBFTables.jl#L54

We limit to ascii on write, we can also fix that: https://github.com/JuliaData/DBFTables.jl/blob/7ea8090fb95419a48cd34c5523a5c57f6e88ff24/src/DBFTables.jl#L90

Probably the reason its like this is there is no real spec for dbf, there are just implementations, and they differ: https://stackoverflow.com/questions/52607578/where-is-the-definitive-official-specification-of-the-dbase-dbf-file-format

ErickChacon commented 1 year ago

I have no experience at all with DBFTables.jl, but I wil try to take a look and follow your suggestions.