Add clustering to BigQuery tables where appropriate

blockchain-etl / ethereum-etl-airflow

Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethereum-smart-contract-into-bigquery-in-8-mins-bab5db1fdeee

MIT License

407 stars 192 forks source link

Add clustering to BigQuery tables where appropriate #28

Open medvedev1088 opened 5 years ago

medvedev1088 commented 5 years ago

https://cloud.google.com/bigquery/docs/clustered-tables. It can make some queries cheaper and faster.

allenday commented 4 years ago

I was thinking of this when reading @askeluv 's blog post: https://towardsdatascience.com/how-to-get-any-ethereum-smart-contract-into-bigquery-in-8-mins-bab5db1fdeee If we implement clustering/partitioning, it should be possible to reduce the query cost of using the logs table directly, i.e. no need to create contract-specific tables.

However, there is some active work being done for streaming into partitions, see: https://issuetracker.google.com/issues/35905817#comment89

medvedev1088 commented 4 years ago

That's a good idea.