davidgasquez / gitcoin-grants-data-portal

🌲 Open source, serverless, and local-first data hub for Gitcoin Grants data!
https://grantsdataportal.xyz/
MIT License
27 stars 3 forks source link

Optimize Parquet files #66

Open davidgasquez opened 7 months ago

davidgasquez commented 7 months ago

Compress and sort them!

Not sure if it can be done for the DuckDB database.

davidgasquez commented 7 months ago
copy(
    select * from 'data/allo_donations.parquet' 
    order by round_id, donor_address, project_id, token_address, recipient_address) 
to 'data/allo_comp_rs_sorted.parquet' (compression 'zstd', row_group_size 10000000)