JuliaGeo / Shapefile.jl

Parsing .shp files in Julia
http://juliageo.org/Shapefile.jl/
MIT License
82 stars 14 forks source link

Allow Shapefile to read `.zip` files #113

Closed asinghvi17 closed 2 months ago

asinghvi17 commented 2 months ago

This is an implementation of #75 that "dispatches" on provided file name. Ideally, this should also work with some form of streaming type so that we can pass that to ZipFile and thus use cloud shapefiles directly, but this will do for now.

The PR implements an extension ShapefileZipFilesExt which provides a method for a function _read_shp_from_zipfile that is defined as a function with no methods in Shapefile.jl. This also allows us to provide a nice error message if the extension is not loaded.

CC: @dgleich

asinghvi17 commented 2 months ago

Test failures seem a bit strange...

dgleich commented 2 months ago

Oh, this is awesome! Thanks for pushing it through!

rafaqz commented 2 months ago

About those errors being a bit strange :

https://github.com/JuliaData/DBFTables.jl/pull/33/files#diff-9692c69b0f85888c0a7fdca8e78c1f3f9f1f41d811d5b4491c5f44e4540210b8R457

asinghvi17 commented 2 months ago

Ah! Should I fix the tests then?

rafaqz commented 2 months ago

Yes! lets look out for little breaking changes like that next time and bump the minor version ;)

asinghvi17 commented 2 months ago

CI passes now. Should we merge and release? I've updated the minor version (so this would technically be a breaking release), even if nothing is actually going to break.

rafaqz commented 2 months ago

What about .gz and every other zip format? .zip is pretty windows centric?

asinghvi17 commented 2 months ago

It seems that ZipFile.jl only handles .zip files. We could switch to TranscodingStreams but I have no clue how (or if) it handles files...

dgleich commented 2 months ago

There are a lot of places that distribute shape files as zip files (e.g. this seems to be how the US government does it), so this provides benefit as is... i.e you can download the zip file and read it directly.

(My original use case involved working with a few hundred such zip files, which I really didn't all want to unzip...)

The TarFiles situation isn't as straightforward as there isn't (yet) a straightforward way to read a tar.gz file as a list of Julia IO objects as there is for zip files. See, e.g. https://github.com/JuliaIO/Tar.jl/pull/95 So I wouldn't worry about .tar.gz files right now.

rafaqz commented 2 months ago

Makes sense. If zip is 95% of the files and the only thing thats easy then thats fine.

asinghvi17 commented 2 months ago

Do we need more reviews or should I merge + release?