Closed jdries closed 7 months ago
Done: tests show that duckdb can handle this file Also was able to write a partitioned version, partitioned by h3 index: https://duckdb.org/docs/data/partitioning/partitioned_writes.html
import duckdb
db = duckdb.connect()
db.execute('SET threads=1; COPY (select * from read_parquet("/home/driesj/2018_FR_LPIS_POLY_110.geoparquet")) TO "lpis" (FORMAT PARQUET, PARTITION_BY (h3_l3_cell), OVERWRITE_OR_IGNORE 1,per_thread_output false);')
This took only 25seconds, does use some memory.
This seems to imply that a parquet-based rdm can also work, as long as we apply partitioning.