Toblerity / Fiona

Fiona reads and writes geographic data files
https://fiona.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.15k stars 202 forks source link

Fiona > 1.7.5 does't open my shapefile - two different errors #510

Closed miker985 closed 6 years ago

miker985 commented 6 years ago

I've run in to issues using fiona that I suspect are hitting some edge-case somewhere. I'm not clear if this is the shapefile's issue and only newer versions of fiona are hitting them, or if this is a problem with one of the recent commits to fiona.

Expected behavior and actual behavior.

My test-fiona.py script (run on python 2.7 and 3.6)

#!/usr/bin/env python
from __future__ import print_function
import fiona
print(fiona.__version__)
# J is a shared filesystem I have symlink'd for convenience
collection = fiona.open('/Users/miker985/J/WORK/11_geospatial/05_survey shapefile library/Shapefile directory/IDN_DHS_1994.shp')
print("\tOpened")

My test script

pip install fiona==1.7.0
./test-fiona.py

for MINOR_VER in {1..10}; do
    pip install --upgrade fiona==1.7.${MINOR_VER} &>/dev/null
    ./test-fiona.py
done

For both pythons 2.7 and 3.6:

Steps to reproduce the problem.

I've included the above scripts as an example. I'm still trying to get permission to share IDN_DHS_1994.shp, but as it's a shapefile distributed by DHS I'm not clear if this is possible.

Operating system

Mac OS X 10.12.5.

Fiona version and provenance

As stated above, I've tested fiona 1.7.0 thru 1.7.10. If you need the specific dist that pip is installing I can get you all 10 - just let me know.

jdmcbr commented 6 years ago

@miker985 This looks similar to an issue @ozak observed in #486. @ozak are you still seeing this issue? And, if so, do you have a shapefile you can share that shows the issue?

ozak commented 6 years ago

@jdmcbr sorry I cannot remember what the status of this was on my machines. After encountering the issue I had downgraded fiona so as to not have to fully figure it out since I did not have much time. My tests in #486 suggested it could be a conda or a python version issue, but I did not have time to follow up. One thing I noticed is that if I changed the .cpg file using a text editor and set the encoding, it seemed to work ok.

jdmcbr commented 6 years ago

@ozak Interesting. If you have a shapefile you can share, I can investigate a bit.

sgillies commented 6 years ago

One thing I noticed is that if I changed the .cpg file using a text editor and set the encoding, it seemed to work ok.

@ozak what was the original value in the .cpg file and to what did you change it?

ozak commented 6 years ago

it was empty and I set it to UTF-8.

ozak commented 6 years ago

I have put a shapefile with the changed .cpg here.

sgillies commented 6 years ago

Shapefile .dbf encodings are messy. From what I read above, yours is UTF-8 encoded. If your Python systemencoding is UTF-8 things should work as you expect, but it looks to me like your systemencoding is 'ANSI 1252'. Adding a .cpg file with the proper encoding is one way out of this. Another is to add a the 'encoding' keyword argument when you call fiona.open(). 'encoding="utf-8"' in your case.

ozak commented 6 years ago

But for some reason all worked ok in previous versions of fiona and work perfectly in qgis. I tried using the keyword argument and I think I had the same error, although I can't remember if I had tried it directly or passing it to geopandas.

sgillies commented 6 years ago

Here on my computer, where locale.getpreferredencoding() is 'UTF-8', I can read from the shapefile without trouble.

>>> import locale
>>> locale.getpreferredencoding()
'UTF-8'
>>> import fiona
>>> with fiona.open('/Users/seang/Downloads/Fiona/FRA.shp') as src:
...     for feat in src:
...         print(feat['properties']['NAM_UNICOD'])
...
Alsace
Aquitaine
Auvergne
Basse-Normandie
Bourgogne
Bretagne
Centre
Champagne-Ardenne
Corse
Franche-Comté
Haute-Normandie
Île-de-France
Limousin
Languedoc-Roussillon
Lorraine
Midi-Pyrénées
Nord-Pas-de-Calais
Provence-Alpes-Côte dʼAzur
Poitou-Charentes
Pays de la Loire
Picardie
Rhône-Alpes

Behavior of fiona definitely has changed on you, and I'm sorry about that. I feel that the current behavior is more correct.

Mainly the problem lies with the provider of your shapefile. UTF-8 isn't strictly legal for shapefiles. Some software like GDAL can handle it with an appropriate .cpg file. QGIS clearly makes different choices than Fiona in the absence of a .cpg file.

With fiona, the way to specify an encoding precisely when open a file will always be to use the encoding keyword argument.

sgillies commented 6 years ago

Another observation: with Fiona 1.7.5, SHAPE_ENCODING="ANSI 1252" fio cat ~/Downloads/Fiona/FRA.shp doesn't error but misrepresents data. In the JSON, you can see "Rh\u00c3\u00b4ne-Alpes". This isn't correct.

$ python
Python 3.6.3 (v3.6.3:2c5fed86e0, Oct  3 2017, 00:32:08)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> "Rh\u00c3\u00b4ne-Alpes"
'Rhône-Alpes'

The proper representation of ô is \u00f4 as in.

>>> json.dumps('Rhône-Alpes')
'"Rh\\u00f4ne-Alpes"'

With Fiona 1.7.10, fio-cat raises an exception instead of mangling the character.

$ SHAPE_ENCODING="ANSI 1252" fio cat ~/Downloads/Fiona/FRA.shp
ERROR:fio:Exception caught during processing
Traceback (most recent call last):
  File "/Users/seang/envs/fio_issue510/lib/python3.6/site-packages/fiona/fio/cat.py", line 71, in cat
    with fiona.open(path, layer=lyr) as src:
  File "/Users/seang/envs/fio_issue510/lib/python3.6/site-packages/fiona/__init__.py", line 165, in open
    enabled_drivers=enabled_drivers)
  File "/Users/seang/envs/fio_issue510/lib/python3.6/site-packages/fiona/collection.py", line 153, in __init__
    self.session.start(self)
  File "fiona/ogrext.pyx", line 395, in fiona.ogrext.Session.start
  File "fiona/_err.pyx", line 185, in fiona._err.GDALErrCtxManager.__exit__
fiona._err.CPLE_AppDefinedError: b'Recode from ANSI 1252 to UTF-8 failed with the error: "Invalid argument".'
Aborted!
jdmcbr commented 6 years ago

@sgillies Playing around with some different shapefile encodings on a computer that doesn't have utf-8 as the preferred encoding, and got some weird behavior. The following works fine:

utf_file = 'path/to/file/has_utf_cpg.shp'
non_utf_file = 'path/to/file/has_non_utf_cpg.shp'

with fiona.open(utf_file, encoding='utf-8') as f:
   pass

with fiona.open(non_utf_file, encoding='utf-8') as f:
    pass

However, this crashes with the CPLE_AppDefinedError:

non_utf_file = 'path/to/file/has_non_utf_cpg.shp'

with fiona.open(non_utf_file, encoding='utf-8') as f:
    pass

I spent a little time trying to see how this could possibly happen, without success.

miker985 commented 6 years ago

The FRA shapefile triggers the error I was getting (fiona version 1.7.10) IFF I set FRA.cpg to ANSI 1252. If that file is empty or removed FRA.shp opens correctly.

echo UTF-8 > IDN_DHS_1994.cpg also fixes my error. Originally this value was set to ANSI 1252 so perhaps the encoding is actually incorrect. I'll need to verify the correct encoding but that's a separate issue.

Is there any way to override the .cpg file and supply a manual encoding?

Thank you much @ozak for the example file.

sgillies commented 6 years ago

@miker985 @ozak thank you for your patience and feedback. I believe I've found a bug. Work on a fix will continue in #512.

sgillies commented 6 years ago

@miker985 @ozak Fiona 1.7.11 is on PyPI now and has the fix. Thanks for your help 🙏