Nowosad / spData

Datasets for spatial analysis
https://jakubnowosad.com/spData/
62 stars 9 forks source link

Storing data in a cross-language form #71

Open martinfleis opened 1 day ago

martinfleis commented 1 day ago

Hi,

would you be keen on storing the data in some open formats alongside rda so we could link to it from Python? We have a geodatasets package that holds metadata and some tooling to cache the data locally so if you including the data here as GeoJSON, CSV, GPKG or whatever is needed we could include them in geodatasets allowing easier access to the same data from R and Python, avoiding the need of running R first to save the data Python can read.

Robinlovelace commented 1 day ago

Would be really useful to have cross-language datasets. Maybe a spDatapy or spDatax repo could be worthwhile, to avoid issues with CRAN..

Nowosad commented 1 day ago

@martinfleis what do you have in mind? Do you want to store the files in some python package? Many of the datasets from spData are available in inst/shapes -- https://github.com/Nowosad/spData/tree/master/inst/shapes (although we plan to remove shapefiles soon from there -- https://github.com/Nowosad/spData/issues/62). Do you need any other dataset from spData as a file?

martinfleis commented 1 day ago

Many of the datasets from spData are available in inst/shapes

Missed that! That is what I was looking for. If these links are considered stable, I would just include them in geodatasets for easy access from Python.

Nowosad commented 1 day ago

Yes, they are v. stable. (Except the .shp files, which will be removed in ~two months)

Nowosad commented 1 day ago

@martinfleis .gal files are at https://github.com/Nowosad/spData/tree/master/inst/weights

martinfleis commented 17 hours ago

I have exposed those datasets that live in inst/shapes in geodatasets in https://github.com/geopandas/geodatasets/pull/27. It is far from the complete list but I believe that the rest is not available as files but generated in some form?

Nowosad commented 16 hours ago

The rest of them are .rda object -- do you want all of the datasets from the README available (except the one we discussed yesterday)? If so, I could just create another GH repo for that.

martinfleis commented 16 hours ago

It would be nice for independence of R and Python examples depending on the same data. The tiny snippet @Robinlovelace used during SDSL required R running prior to Python to load the file and dump it to the disk before it could be read by geopandas. Having it available directly would allow more freedom in what runs first and in what runs at all.

Robinlovelace commented 16 hours ago

+1 to increasing modularity and x-language compat (without having to depend on either for shared examples).