GeospatialPython / pyshp

This library reads and writes ESRI Shapefiles in pure Python.
MIT License
1.09k stars 259 forks source link

ValueError when reading shapefile from ZIP archive #239

Closed stuckyb closed 2 years ago

stuckyb commented 2 years ago

This is such a weird bug I'm hesitant to even open an issue, but here goes. In words: When attempting to read a shapefile from a ZIP archive by manually providing the shp and dbf file-like objects, if a zipfile.Path object is created that references a non-existent item in the ZIP archive, and this is all done from within a function, subsequent attempts to read the shapefile throw a ValueError: seek of closed file exception.

Here is minimal code to trigger the bug (I will also upload a simple shapefile to use with this example):

import zipfile
import shapefile

def _openZippedShapefile(zf):
    shp_path = 'shapefile_pnt-no_crs.shp'
    dbf_path = 'shapefile_pnt-no_crs.dbf'
    prj_path = zipfile.Path(zf, at='shapefile_pnt-no_crs.prj')

    sfr = shapefile.Reader(
        shp=zf.open(shp_path, mode='r'),
        dbf=zf.open(dbf_path, mode='r')
    )
    #print(sfr.__geo_interface__)

    return sfr

zf = zipfile.ZipFile('shapefile_pnt-no_crs-no_dir.zip', mode='r')
sfr = _openZippedShapefile(zf)
#print(len(sfr.shp.read(200)))
#print(len(sfr.dbf.read(200)))

print(sfr.__geo_interface__)

Key points:

  1. If the code is run as written above, it results in ValueError: seek of closed file.
  2. If line 7 (where prj_path is defined) is commented out, everything works as expected. (The ZIP archive does not actually contain a .prj file.)
  3. If the print() statement within the function is uncommented, it works even if line 7 is enabled.
  4. If the print(read()) statements near the bottom are uncommented, they produce the same results regardless of whether line 7 is enabled.

I am honestly not certain whether this is a bug with shapefile or Python's zipfile library, but key point 4, above, makes me think that pyproj is involved somehow.

Defining a zipfile.Path object that references a non-existent file might seem pointless, but it is one way to test whether a file (.prj file, e.g.) actually exists in an archive.

And I wanted to be sure to also thank you for providing this outstanding library!

stuckyb commented 2 years ago

And here is a very simple shapefile for use with the above example: shapefile_pnt-no_crs-no_dir.zip.

stuckyb commented 2 years ago

One more piece of information. I just now tried running the above code with Python 3.10.4, and it appears to work as expected. This would strongly suggest (to me, anyway), that it is/was a problem with Python's zipfile library. So perhaps this is a non-issue with more recent Python releases. I'll leave this issue open in case any project maintainers would like to take a look.

karimbahgat commented 2 years ago

Glad you're enjoying the library. I'm not able to dive too deep into the issue, but it does seem be a bug with the zipfile.Path which indeed was fixed in Python 3.10: https://bugs.python.org/issue40564. The bug seems to be that zipfile.Path closes the zipfile prematurely, after which trying to read the shapefile from the shapefile fails, althought there's probably more details to it than that. .