celestiaorg / celestia-app

Celestia consensus node
https://celestiaorg.github.io/celestia-app/
Apache License 2.0
341 stars 281 forks source link

Using BigQuery for big block data analysis #3389

Open staheri14 opened 5 months ago

staheri14 commented 5 months ago

Problem

During a test involving 100 nodes, we accumulate approximately 100GB of traced data. Analyzing this data typically requires downloading it locally to perform queries, or selectively fetching subsets relevant to the analysis. Both approaches are constrained by the limitations of individual machines, such as CPU capacity and disk space. To enhance efficiency, we propose using BigQuery. This solution would allow us to retain our data in the cloud, enabling us to perform queries without needing to download the data or rely on the limited resources of individual devices.

Acceptance Criteria

This task involves two primary objectives:

staheri14 commented 5 months ago

I'd like to share some updates:

I'll keep you updated as I explore more features.

staheri14 commented 4 months ago

After further investigation, discovered that to use Jupyter Notebook with BigQuery, we need to utilize Vertex AI Workbench. This managed service provided by Google Cloud offers the following capabilities that suit our use cases: