GeospatialPython / pyshp

This library reads and writes ESRI Shapefiles in pure Python.
MIT License
1.1k stars 259 forks source link

Cannot create shapefile with unicode field names #139

Closed ekeydar closed 5 years ago

ekeydar commented 6 years ago

I'm using the new (2.0 dev) version with python3, and trying to create unicode field names (in Hebrew). This fails to me, since there is explicit encoding to ascii for the field names. In the previous version we succeed in exporting unicode field names (we did the encoding manually in our side). If I encode the field name manually before calling record.field, then I get different error.

Attached two examples.

Thanks!

ex.py.gz ex2.py.gz

karimbahgat commented 6 years ago

This is intentional. The shapefile/dbf format explicitly states that the field names must be pure ASCII. What changed in 2.0dev is that we added stricter enforcement of this. This is a known quirk with the shapefile format, which I agree is not ideal.

Perhaps we could be more lenient on this for those who really wish to write such files, but I guess it depends how other shapefile readers handle such files. What's your experience, how do other GIS software read your shapefiles with Hebrew field names? Do they show up as the encoded ASCII values, or are they displayed correctly as Hebrew characters?

micahcochran commented 5 years ago

Here's an excerpt from ESRI ArcGIS Desktop 9.3 documentation:

"The dBASE file standard only supports ANSI characters in their field names and values. ESRI has added extensive Unicode support for dBASE files to allow you to store Unicode field names and values. But this additional support resides only in ArcGIS and not in non-ESRI applications. Supporting Unicode in dBASE is an ongoing effort at ESRI, meaning that issues continue to be found and resolved.

NOTE: If you have to support Unicode in your field names or field values, we strongly suggest that you use geodatabases rather than shapefiles."

If ESRI software supports this, then support is pretty widespread. It would be interesting to know if GDAL (and by extension QGIS) supports unicode field names for shapefiles. Also, it would be nice to have an example shapefile.

@karimbahgat I can see that supporting unicode fields might cause other problems.

karimbahgat commented 5 years ago

Just checked that fiona (via gdal, thus also qgis) successfully loads the file with hebrew field names created in ex.py. Not checked that it writes. Given how often this comes up will therefore try to bring unicode field names back in. Problems I could see happening include unicode field names very quickly exceeding the 10 byte character limit and possibly getting cut off.

karimbahgat commented 5 years ago

Included now in version 2.1.0