Closed samsammurphy closed 1 year ago
@sandorkertesz please have a look at this and how pdbufr will also play into it. In the end, loading points from geojson or from bufr should feel and look the same, no?
I think it is a good idea and would also work with BUFR data. At the moment, erthkit-data only offers the to_pandas() method to deal with BUFR, which extracts the specified data/columns into a pandas DataFrame using pdbufr. Since pdbufr already supports the "geometry" and "CRS" columns this data would be fine for GeoPandas.
The main question for me is what API we want to offer for geospatial point data in earthkit-data on top of these methods:
Is that enough and can we do everything using GeoPandas?
Thanks for the chat earlier today @sandorkertesz. Following up on that, and comments above,..
to_geopandas()
methodfeatures
(like in the geojson spec).geospatial.csv
(which was created by reading the geojson into a geopandas geodataframe then saving to .csv so that there is a geometry column with shapely objects represented as text)Sample Data example_vector_files.zip
The first to_geopandas
implementation will be BUFR (see #84).
@sandorkertesz closing this issue but lmk if I should re-open
How should we load georeferenced shapes (e.g. points, lines, polygons) from a file?
Motivating use case: Load points from .csv and geopoints file (e.g. .geojson, .shp, .kml)
Standard practice: As a geospatial data scientist I would use geopandas to load shapes from a file into a geopandas dataframe.
GeoDataFrame makes it easy to, for example, filter by geographic region of interest, change the coordinate reference system (crs) and do other geospatial things like calculate distances and areas.
Known Issue. A .csv file is not natively geospatial. This requires handling. In the case of loading points we would need to know which column(s) contains the point coordinates, and how to parse them, as well as the crs (which is typically not explicit). Here is an example of reading a csv into a geodataframe
Opinionated view. We should follow the geopandas convention. When we write to file, shapes must be stored in a column called geometry . Geospatial methods work automatically (and exclusively) on the shapes in the geometry column. They should be shapely objects. It's fine to have different types of shapes (e.g. points and lines) in the geometry column. We can read a non geospatial .csv file without falling over but will complain when need (e.g. geometry column does not exist, crs is not set, etc.).