AmericanRedCross / street-view-green-view

BSD 3-Clause "New" or "Revised" License
15 stars 15 forks source link

geospatial data storage format #16

Closed danbjoseph closed 7 months ago

danbjoseph commented 7 months ago

The process starts with generating a collection of geographic points and then adds additional attributes such as the filename of associated image(s) and GVI score(s). Is shapefile the best format to use for manipulating and storing our data?

However, I also wonder if any problems with the geospatial file format will occur well after other issues (for example, saving an image for each point to disk)?

jayqi commented 7 months ago

The way that create_points.py is currently implemented, it works with any of the standard vector formats like Shapefiles, GeoJSON, or Geopackage on both the input side and the output side (they can be the same or they can be different). We use geopandas to read and write the files, and geopandas supports these all equivalently.

I think we should continue to write our code that way (e.g., the step that reads in the points to download images should also use geopandas and accept any of vector data formats) as this flexibility has no real cost to us and seems nice to have.

Perhaps a different question we want to answer is: Should this project adopt a particular vector data format as a convention? That means that we would standardize on using a particular format on files we control, even if the implementation would work with other formats. I think this seems reasonable. I don't do enough GIS data work to have a strong opinion on this. Geopackage is as good as any.


Regarding if we should just save the points as CSV files—my inclination here is to keep these in a vector data format instead. This simplifies use cases like plotting the locations, or opening up the files in a desktop app like QGIS.

danbjoseph commented 7 months ago

Oh yeah, for the initial input, if it's flexible in the format that's great. For the output file that is passed between steps, I think GeoPackage might be the better choice. QGIS can open a CSV of points the same as any other file format, but I think there may be performance benefits from the GeoPackage with large collections of points.