Closed cheginit closed 4 months ago
Thanks @cheginit. We are big fans of duckdb and have some documentation about how to use it to get data out https://docs.overturemaps.org/getting-data/locally/. I opted to use pyarrow here though. I'll close this out.
By the way - I think your query in your post has a bug
SELECT
data.*,
ST_GeomFromWKB(data.geometry) as geometry,
FROM data_view AS data
...
will have 2 geometry columns since data.*
also has one. I ran it to verify the output parquet file has both a geometry
and geometry_1
column. I think you can use duckdb's EXCLUDE clause to omit it from the first part. But since geometry is already a wkb I don't think you need anything other than data.*
there.
Thanks for catching the bug and the link, I didn't realize there's one!
May I ask why did you opt for using pyarrow
?
While I was updating a blog post that I wrote a while back on subsetting Overture data using DuckDB, I stumbled upon this package. I noticed, this package uses
pyarrow
. I thought you might be interested in exploring this alternative approach, as it might have some benefits, especially for large requests. Here's the link to my short blog post containing the code.