blockchain-etl / ethereum-etl

Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
https://t.me/BlockchainETL
MIT License
2.94k stars 841 forks source link

add support for ClickHouse as export target #422

Open gingerwizard opened 1 year ago

gingerwizard commented 1 year ago

ClickHouse is a popular OSS OLAP db and a natural fit for this data

dex-sv commented 1 year ago

Clickhouse is not an OLTP db

tbragin commented 1 year ago

I'm sure that's a typo. @gingerwizard let's fix :) ClickHouse is an OLAP database.

gingerwizard commented 1 year ago

Fixed 🤦

dex-sv commented 1 year ago

Hi, I did a few experiments with the code from this PR and I got errors like those:

OverflowError: int too big to convert clickhouse_connect.driver.exceptions.ProgrammingError: Internal serialization error. This usually indicates invalid data types in an inserted row or column

The errors are caused by attempting to insert large integer values into numeric columns.

Could you please share some info about the decisions on the data types used in the tables?

gingerwizard commented 1 year ago

I will PR the fix for this and correct these types .

Decimal(38, 0) was selected as this is the eqv. of what BigQuery applied. I assumed it could grow beyond the range of a UInt64. I'll check the data and update.

For timestamps Id like to move to the explicit DATETIME type.