GeospatialPython / pyshp

This library reads and writes ESRI Shapefiles in pure Python.
MIT License
1.1k stars 259 forks source link

Open dbf with pyshp #195

Closed postfalk closed 3 years ago

postfalk commented 4 years ago

Some geospatial projects are distributed in shapefiles and additional attributes in separate .dbf files. The National Hydrographic dataset is one case in point (http://www.horizon-systems.com/NHDPlus/NHDPlusV2_home.php). It would be very handy if this data could be loaded with the same library especially since other dbf or geospatial libraries don't support loading from (several) io.BytesIO objects which my project requires because the data is stored in S3.

Reading the code, I came to following conclusions: I could either implement my own version of https://code.activestate.com/recipes/362715-dbf-reader-and-writer/ (on which pyshp relies), remove the check whether the shp part is present and generate NULL geometries instead, or make the loading methods for the different parts of a shapefile public and use only the one for the dbf part.

Also, the docstring of the Reader class states that it can be initialized without providing files in order to call .load() later. But that does not seem to be the case anymore since __init__ throws an error if self.shp and self.dbf are None.

karimbahgat commented 3 years ago

Apologies for the very late reply here. Loading from standalone dbf or shp files from url sources should be possible as stated in the README, simply by passing the BytesIO or other file-like object (in your case where the data is read from a url) as keyword args:

dbfReader = shapefile.Reader(dbf=dbfBytesObject)
shpReader = shapefile.Reader(shp=shpBytesObject)

To conform with the docs, I've now also removed the constructor error if no shp or dbf is detected, delaying any error until the data is actually requested, i.e. so that an empty reader can be created and loaded using .load() later on. Also added tests for this behavior, as well as tests for separate shp/shx/dbf file loading.