Closed nicholas-ys-tan closed 4 weeks ago
One thing that I wanted to confirm is that we don't want the user to be able to specify their own
bbox
column? So they can put in their own custom bounds that they calculate or modify themselves for whatever reason
It's true that you might already have this data available, but personally I would leave that until later in case there is actually user request for this. In general calculating bounds is not that expensive.
In general calculating bounds is not that expensive.
I did a quick test using the nz-building-outlines.gpkg
file (30 million polygons, 1.2 GB gpkg file), and writing that file with write_covering_bbox=True
enabled. For this case, calculating the bounds for the bbox column takes around 2% of the to_parquet
time.
Thanks @nicholas-ys-tan!
Resolves #3308
This is a continuation of #3282 to address 3 items:
- ensure this is robust for the geometry column name (no hardcoded "geometry") The was an implicit hardcoding that occurred when checking of the covering encoding was in the metadata, and when seeking the bbox encoding field name. My approach assumes that there will only be one entry for the geo_metadata columns, which I assume to be a valid assumption as there can only be one active geometry.
- test reading a file that uses a different column name as "bbox" The code as it is addresses this by looking at the metadata to discover the field name. To facilitate testing of this, some refactoring was done to add the field name as a kwarg in the private functions. This still maintains "bbox" as the only allowable fieldname when writing, but this refactor facilitated writing the tests. Hope that is acceptable.
- test writing in case your geodataframe already has a bbox column When testing, no error showed when writing the parquet file, but an error was raised when reading - it was not a very descriptive error but seems to be attributed to two fields having identical names. A ValueError has been added if the dataframe already has a column with the name "bbox" and the user has
write_bbox_covering=True
.One thing that I wanted to confirm is that we don't want the user to be able to specify their own
bbox
column? So they can put in their own custom bounds that they calculate or modify themselves for whatever reason - in which case I should be parsing theirbbox
column to ensure formatting is appropriate and pushing that in, instead of calculating it for the user?