Open PetroTruemetrics opened 5 days ago
I think one of the reasons this is taking longer than expected is due to the (still outstanding) Athena bug where summary statistics on a nested float column (our bbox column) return incorrect results. More here: https://github.com/OvertureMaps/data/discussions/1#discussioncomment-9159544. So the table currently has use of statistics disabled causing longer run times and increased data scanned. @mojodna
We should consider just returning the bbox column back to doubles.
@jwass is there a special reason that there is no S2 or H3 partitioning?
Is there another way to get around that problem? I currently see no way to use any geospatial indexing with AWS athena, which makes overture useless in scenarios in which you only want to read a small portion of the data.
e.g. I spend already hundreds of dollars on athena cost just for loading a couple of hundert building polygons via overture.
I have the following kind of a query in AWS Athena, which takes about 12-13 seconds to run and over 20GB of data to scan, which is too slow for my use case. I would like to make use of partitioning by a division, for example by a country, but it seems like some rows, in particular in the following location, have division related data completely missing.
Is there any other alternative how I could make the query run faster?