Data partitioning - Githubissues

PetroTruemetrics commented 5 days ago

I have the following kind of a query in AWS Athena, which takes about 12-13 seconds to run and over 20GB of data to scan, which is too slow for my use case. I would like to make use of partitioning by a division, for example by a country, but it seems like some rows, in particular in the following location, have division related data completely missing.

Is there any other alternative how I could make the query run faster?

SELECT *, ST_GeomFromBinary(geometry) AS geometry
FROM v2024_07_22_0
WHERE (theme = 'buildings' AND type = 'building')
AND bbox.xmin > 25.260103773772702 
AND bbox.xmax < 25.264066227154125
AND bbox.ymin > 54.66989833391441 
AND bbox.ymax < 54.6724108411568
AND ST_Intersects(
   ST_GeomFromBinary(geometry), 
   ST_GeometryFromText('POLYGON ((25.2616152492726 54.67081885611437, 25.262554637777246 54.67081885611437, 25.262554637777246 54.67149033634276, 25.2616152492726 54.67149033634276, 25.2616152492726 54.67081885611437))')
);

jwass commented 5 days ago

I think one of the reasons this is taking longer than expected is due to the (still outstanding) Athena bug where summary statistics on a nested float column (our bbox column) return incorrect results. More here: https://github.com/OvertureMaps/data/discussions/1#discussioncomment-9159544. So the table currently has use of statistics disabled causing longer run times and increased data scanned. @mojodna

We should consider just returning the bbox column back to doubles.

JBisc commented 5 days ago

@jwass is there a special reason that there is no S2 or H3 partitioning?

JBisc commented 2 days ago

Is there another way to get around that problem? I currently see no way to use any geospatial indexing with AWS athena, which makes overture useless in scenarios in which you only want to read a small portion of the data.

e.g. I spend already hundreds of dollars on athena cost just for loading a couple of hundert building polygons via overture.

OvertureMaps / data

Data partitioning #226