WorldCereal / ewoc_rdm_api

Backend APIs for Reference Data Module, used by website and other modules
MIT License
0 stars 0 forks source link

Geoparquet: test on very large file #7

Closed jdries closed 4 months ago

jdries commented 5 months ago
jdries commented 4 months ago

Done: tests show that duckdb can handle this file Also was able to write a partitioned version, partitioned by h3 index: https://duckdb.org/docs/data/partitioning/partitioned_writes.html

import duckdb
db = duckdb.connect()
db.execute('SET threads=1; COPY (select * from read_parquet("/home/driesj/2018_FR_LPIS_POLY_110.geoparquet")) TO "lpis" (FORMAT PARQUET, PARTITION_BY (h3_l3_cell), OVERWRITE_OR_IGNORE 1,per_thread_output false);')

This took only 25seconds, does use some memory.

This seems to imply that a parquet-based rdm can also work, as long as we apply partitioning.