Closed miker985 closed 6 years ago
@miker985 This looks similar to an issue @ozak observed in #486. @ozak are you still seeing this issue? And, if so, do you have a shapefile you can share that shows the issue?
@jdmcbr sorry I cannot remember what the status of this was on my machines. After encountering the issue I had downgraded fiona
so as to not have to fully figure it out since I did not have much time. My tests in #486 suggested it could be a conda
or a python
version issue, but I did not have time to follow up. One thing I noticed is that if I changed the .cpg
file using a text editor and set the encoding, it seemed to work ok.
@ozak Interesting. If you have a shapefile you can share, I can investigate a bit.
One thing I noticed is that if I changed the .cpg file using a text editor and set the encoding, it seemed to work ok.
@ozak what was the original value in the .cpg file and to what did you change it?
it was empty and I set it to UTF-8
.
Shapefile .dbf encodings are messy. From what I read above, yours is UTF-8 encoded. If your Python systemencoding is UTF-8 things should work as you expect, but it looks to me like your systemencoding is 'ANSI 1252'. Adding a .cpg file with the proper encoding is one way out of this. Another is to add a the 'encoding' keyword argument when you call fiona.open(). 'encoding="utf-8"' in your case.
But for some reason all worked ok in previous versions of fiona
and work perfectly in qgis
. I tried using the keyword argument and I think I had the same error, although I can't remember if I had tried it directly or passing it to geopandas
.
Here on my computer, where locale.getpreferredencoding()
is 'UTF-8'
, I can read from the shapefile without trouble.
>>> import locale
>>> locale.getpreferredencoding()
'UTF-8'
>>> import fiona
>>> with fiona.open('/Users/seang/Downloads/Fiona/FRA.shp') as src:
... for feat in src:
... print(feat['properties']['NAM_UNICOD'])
...
Alsace
Aquitaine
Auvergne
Basse-Normandie
Bourgogne
Bretagne
Centre
Champagne-Ardenne
Corse
Franche-Comté
Haute-Normandie
Île-de-France
Limousin
Languedoc-Roussillon
Lorraine
Midi-Pyrénées
Nord-Pas-de-Calais
Provence-Alpes-Côte dʼAzur
Poitou-Charentes
Pays de la Loire
Picardie
Rhône-Alpes
Behavior of fiona definitely has changed on you, and I'm sorry about that. I feel that the current behavior is more correct.
Mainly the problem lies with the provider of your shapefile. UTF-8 isn't strictly legal for shapefiles. Some software like GDAL can handle it with an appropriate .cpg file. QGIS clearly makes different choices than Fiona in the absence of a .cpg file.
With fiona, the way to specify an encoding precisely when open a file will always be to use the encoding
keyword argument.
Another observation: with Fiona 1.7.5, SHAPE_ENCODING="ANSI 1252" fio cat ~/Downloads/Fiona/FRA.shp
doesn't error but misrepresents data. In the JSON, you can see "Rh\u00c3\u00b4ne-Alpes"
. This isn't correct.
$ python
Python 3.6.3 (v3.6.3:2c5fed86e0, Oct 3 2017, 00:32:08)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> "Rh\u00c3\u00b4ne-Alpes"
'Rhône-Alpes'
The proper representation of ô is \u00f4
as in.
>>> json.dumps('Rhône-Alpes')
'"Rh\\u00f4ne-Alpes"'
With Fiona 1.7.10, fio-cat raises an exception instead of mangling the character.
$ SHAPE_ENCODING="ANSI 1252" fio cat ~/Downloads/Fiona/FRA.shp
ERROR:fio:Exception caught during processing
Traceback (most recent call last):
File "/Users/seang/envs/fio_issue510/lib/python3.6/site-packages/fiona/fio/cat.py", line 71, in cat
with fiona.open(path, layer=lyr) as src:
File "/Users/seang/envs/fio_issue510/lib/python3.6/site-packages/fiona/__init__.py", line 165, in open
enabled_drivers=enabled_drivers)
File "/Users/seang/envs/fio_issue510/lib/python3.6/site-packages/fiona/collection.py", line 153, in __init__
self.session.start(self)
File "fiona/ogrext.pyx", line 395, in fiona.ogrext.Session.start
File "fiona/_err.pyx", line 185, in fiona._err.GDALErrCtxManager.__exit__
fiona._err.CPLE_AppDefinedError: b'Recode from ANSI 1252 to UTF-8 failed with the error: "Invalid argument".'
Aborted!
@sgillies Playing around with some different shapefile encodings on a computer that doesn't have utf-8 as the preferred encoding, and got some weird behavior. The following works fine:
utf_file = 'path/to/file/has_utf_cpg.shp'
non_utf_file = 'path/to/file/has_non_utf_cpg.shp'
with fiona.open(utf_file, encoding='utf-8') as f:
pass
with fiona.open(non_utf_file, encoding='utf-8') as f:
pass
However, this crashes with the CPLE_AppDefinedError
:
non_utf_file = 'path/to/file/has_non_utf_cpg.shp'
with fiona.open(non_utf_file, encoding='utf-8') as f:
pass
I spent a little time trying to see how this could possibly happen, without success.
The FRA shapefile triggers the error I was getting (fiona version 1.7.10) IFF I set FRA.cpg
to ANSI 1252
. If that file is empty or removed FRA.shp
opens correctly.
echo UTF-8 > IDN_DHS_1994.cpg
also fixes my error. Originally this value was set to ANSI 1252
so perhaps the encoding is actually incorrect. I'll need to verify the correct encoding but that's a separate issue.
Is there any way to override the .cpg
file and supply a manual encoding?
Thank you much @ozak for the example file.
@miker985 @ozak thank you for your patience and feedback. I believe I've found a bug. Work on a fix will continue in #512.
@miker985 @ozak Fiona 1.7.11 is on PyPI now and has the fix. Thanks for your help 🙏
I've run in to issues using fiona that I suspect are hitting some edge-case somewhere. I'm not clear if this is the shapefile's issue and only newer versions of fiona are hitting them, or if this is a problem with one of the recent commits to fiona.
Expected behavior and actual behavior.
My
test-fiona.py
script (run on python 2.7 and 3.6)My test script
For both pythons 2.7 and 3.6:
fiona.open
succeeds and I get a collection object back.TypeError: __init__() takes exactly 4 positional arguments (2 given)
fiona._err.CPLE_AppDefinedError: b'Recode from ANSI 1252 to UTF-8 failed with the error: "Invalid argument".'
encoding=
keyword-arg tofiona.open
and see no change. Specifically - I see the same error message regardless of the encoding (even if the encoding is invalid e.g.,encoding='WAT'
)Steps to reproduce the problem.
I've included the above scripts as an example. I'm still trying to get permission to share
IDN_DHS_1994.shp
, but as it's a shapefile distributed by DHS I'm not clear if this is possible.Operating system
Mac OS X 10.12.5.
Fiona version and provenance
As stated above, I've tested fiona 1.7.0 thru 1.7.10. If you need the specific dist that pip is installing I can get you all 10 - just let me know.