Open rafaqz opened 1 year ago
I have a dbf
file that include the letter ñ
(common in spanish). It is read adequately with ArchGDAL.jl
, but not with Shapefile.jl
which uses DBFTables.jl
.
Yes that's expected currently, someone needs to implement what I suggested above.
Try a PR if you like.
A good approach would be to connect StringEncodings.jl to the dbf byte codes or "Language driver id" in the table here: http://www.dbase.com/Knowledgebase/INT/db7_file_fmt.htm
There is already a field for this in the header, we just don't use it https://github.com/JuliaData/DBFTables.jl/blob/7ea8090fb95419a48cd34c5523a5c57f6e88ff24/src/DBFTables.jl#L54
We limit to ascii
on write, we can also fix that:
https://github.com/JuliaData/DBFTables.jl/blob/7ea8090fb95419a48cd34c5523a5c57f6e88ff24/src/DBFTables.jl#L90
Probably the reason its like this is there is no real spec for dbf, there are just implementations, and they differ: https://stackoverflow.com/questions/52607578/where-is-the-definitive-official-specification-of-the-dbase-dbf-file-format
I have no experience at all with DBFTables.jl
, but I wil try to take a look and follow your suggestions.
dbase files specify the string encoding, and occasionally it's not ascii. We should probably read that and use something like: https://github.com/JuliaStrings/StringEncodings.jl
To convert to UTF8.
See: https://github.com/JuliaGeo/Shapefile.jl/issues/63