blockchain-etl / ethereum-etl

Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
https://t.me/BlockchainETL
MIT License
2.94k stars 845 forks source link

Why separate transactions from blocks in bq schema? #108

Closed jotto closed 5 years ago

jotto commented 6 years ago

I'm curious about the decision to separate blocks from transactions in the BigQuery schema. Is it a personal preference of querying style (SQL joins rather than unnesting SQL arrays?) Or is there a performance or technical reason for it?

medvedev1088 commented 6 years ago

Mainly for simplicity. Most developers are familiar with the standard SQL and joins and are not familiar with non-standard operations like unnesting. From performance point of view BigQuery docs actually recommend to denormalize.

medvedev1088 commented 6 years ago

@jotto did you face any issue related to this e.g. some queries run significantly slower?

jotto commented 6 years ago

@medvedev1088 thanks for the quick response, I haven't tested the performance yet. I'm using your project for architectural inspiration for other blockchains so I wanted to understand if there was a specific reason for the schema.

Do you have any thoughts on querying or persisting state of Ethereum accounts per block height?

medvedev1088 commented 6 years ago

@jotto for state diffs we can use parity trace API. Tracing transactions has already been implemented here https://github.com/blockchain-etl/ethereum-etl/pull/104 but without stateDiff support.

Feel free to submit an issue to add support for state diffs and I will request to fund it with GitCoins similar to this task: https://github.com/blockchain-etl/ethereum-etl/issues/53.

Also feel free to join Gitter and ask questions there https://gitter.im/ethereum-etl/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge