GeospatialPython / pyshp

This library reads and writes ESRI Shapefiles in pure Python.
MIT License
1.1k stars 259 forks source link

Reader.shapes() fails on a particular shapefile #119

Open anyeli opened 6 years ago

anyeli commented 6 years ago

I'm trying to read this shapefile:

http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_boundary_lines_land.zip

The shapefile has 461 polyline records; shapefile.Reader will tell you this correctly via len() and shapeType. However, Reader.shapes() gives me 114 records and the last record has a type of 1075695572 instead of 3. When I use Reader.shape() on each individual record, it seems to do the right thing.

I'm not sure if the shapefile is technically broken, but if I can use it by reading it with shape(), I think I should be able to read it with shapes().

GeospatialPython commented 6 years ago

Interesting. The method used by Reader.shape() is slightly different from the shapes() method. In this particular shapefile, it appears a shape was deleted between shapes 113 and 114 at some point. The shape() method tries to ask the shx index file for the offset of the shape. If it can't find the shx file, it begins with the first shape record and loops through until it hits the end of the file. The other methods just loops through until the end without checking the shx. The shapefile spec allows you to do "lazy deletes" and remove a shape by deleting the data and just leaving a gap which is accounted for in the shx index. That way you don't have to rewrite the entire shapefile. But if you're looping through assuming a continuous series of shapes, it can cause this problem. I think most software does rewrite the whole shapefile after edits to avoid this kind of mess. Whatever software edited this shapefile used a lazy delete. A more robust method for pyshp would be to wrap shape() with shapes() to try and use the shx file if available. In all this time nobody has presented a file like this so it's good to have something to test.

visr commented 5 years ago

In all this time nobody has presented a file like this so it's good to have something to test.

Ha I came upon this issue looking for such a testfile, for a dbf reader in julia. But since they are so hard to find, I decided to just not support the deleted record marker fully yet.

It seems because of readers ignoring deleted record markers, they are more often packed away during writes. The linked Natural Earth shape now no longer contains them. QGIS also repacks the files after modifications. If you still want a test file, have a look at the attachments in https://issues.qgis.org/issues/11007#note-30.