blockchain-etl / ethereum-etl

Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
https://t.me/BlockchainETL
MIT License
2.94k stars 842 forks source link

Provide access to full history through GCS buckets #357

Open bryzgaloff opened 2 years ago

bryzgaloff commented 2 years ago

BigQuery datasets are available publicly. It would also be awesome to unload the full history of ETH history to, let's say, AWS S3 and make it queryable through Athena.

The request is: could you please provide access to raw data in GCS buckets? Parquet format would be awesome, but JSON/CSV is also ok. Unloading from GCS should be more efficient than from BigQuery, I believe. Please correct me if I am wrong.

If you may provide me with access to the buckets, I may prepare the data for a public usage in Parquet.

medvedev1088 commented 2 years ago

We don't expose exported files in GCS at this point. An alternative is to use scripts here to export data from BigQuery to GCS https://github.com/blockchain-etl/ethereum-etl-postgres

bryzgaloff commented 2 years ago

Can this be planned for implementation? I am contribute in case you may provide me with sufficient access to your GCS buckets / GCP account. I may configure the read-only credentials to the GCS bucket with data, or maybe make it public.

I may share alternatives once I dig a little bit into it for you to choose, but I would like to see the data itself first to plan if it needs any reformatting.