Closed jorisvandenbossche closed 2 months ago
It would be nice to somehow avoid repeated traversals over the GeoDataFrame to standardize the geometries to their multi equivalents (under
promote_to_multi
option), but I'm not seeing anything obvious. I wonder if this would be a good function to consider adding to shapely to do a single traversal over the geometry array and promote each type to its multi equivalent...
Yes, this is currently lacking a proper helper in shapely, see https://github.com/shapely/shapely/issues/1617
One remaining review comment is the one about writing the index or not: https://github.com/geopandas/pyogrio/pull/395#discussion_r1580972807
For consistency, I should probably drop the index here too? Although we should maybe reconsider that for the default write_dataframe code path as well, to actually write the index of the dataframe as a field, if not a RangeIndex?
For the use in geopandas itself, it doesn't matter, because it has its own logic (it will reset the index to a column to ensure it gets written)
One remaining review comment is the one about writing the index or not: #395 (comment)
For consistency, I should probably drop the index here too? Although we should maybe reconsider that for the default write_dataframe code path as well, to actually write the index of the dataframe as a field, if not a RangeIndex?
For the use in geopandas itself, it doesn't matter, because it has its own logic (it will reset the index to a column to ensure it gets written)
I don't have an explicit preference either way, but it indeed should be consistent...
One remaining review comment is the one about writing the index or not: https://github.com/geopandas/pyogrio/pull/395#discussion_r1580972807
my preference would be to drop it to be consistent, and revisit both of those in the future if needed, and possibly do so on the GeoPandas side rather than here. There are a number of issues to consider if we do make it a column: how to handle name collisions, name truncation (e.g., shapefile), multi-indexes, and round-tripping as an index.
OK, dropped the index here, and also added an explicit test for it (other tests will also fail in case the index would get preserved, but good to have an explicit test about it)
revisit both of those in the future if needed, and possibly do so on the GeoPandas side rather than here.
Note GeoPandas already has considered this (they explicitly reset the index to a column in order to preserve it, if the index has a name and is not a RangeIndex)
Thanks @jorisvandenbossche !
@jorisvandenbossche it looks like testing the sdist failed after merging this: https://github.com/geopandas/pyogrio/actions/runs/8914593210/job/24482500369
Will take a look, I suppose a missing marker (although a bit strange it only fails there)
Follow-up on https://github.com/geopandas/pyogrio/pull/346 to expose the Arrow-based writing through geopandas-based
write_dataframe
as well.Updated our geopandas writing tests to be parametrized on
use_arrow
for writing as well. This didn't catch any issues with thewrite_arrow
itself, but we needed to do some pre-processing to handle the promotion to multi types consistently (as in the normal write path, this happens per geometry insideogr_write
)There are a few things that don't work with Arrow:
No support (yet) for writing an all-missing geometry column (the conversion from pandas to Arrow then doesn't result in the correct type, something that could be fixed in the pre-processing)-> add that in the preprocessing