davidgasquez / gitcoin-grants-data-portal

🌲 Open source, serverless, and local-first data hub for Gitcoin Grants data!
https://grantsdataportal.xyz/
MIT License
26 stars 3 forks source link

Pull data directly from chain #1

Open davidgasquez opened 10 months ago

davidgasquez commented 10 months ago

Currently, we rely on the Allo Indexer API Data. We should add an option to pull data straight from chains using something like cryo or subsquids. This way, we don't need to trust the Allo API data is that's what we want.

davidgasquez commented 8 months ago

Can Gitcoin Data Portal rely on Indexed data?

davidgasquez commented 8 months ago

Can Gitcoin Data Portal rely on Indexed data?

Probably not because Indexed is missing many chains in which GC rounds are running.

We need something like cryo.

davidgasquez commented 8 months ago

This works!

import cryo

cryo.collect(
    "transactions",
    blocks=["18.9M"], 
    rpc="https://eth.merkle.io",
    reorg_buffer=1000,
    max_concurrent_chunks=15, 
    inner_request_size=10000,
    output_dir="data",
    contract=["0x03506eD3f57892C85DB20C36846e9c808aFe9ef4"],
    hex=True
)

Don't forget to pip install cryo-python polars though!

davidgasquez commented 8 months ago

Made a small Colab notebook for people to play around.

From a quick test, it'll take around 52 hour to fully index a that contract, 0x03506eD3f57892C85DB20C36846e9c808aFe9ef4 in Ethereum mainnet.

DistributedDoge commented 8 months ago

cryo.freeze( "events", blocks=["16071515:"], rpc="https://eth.merkle.io", reorg_buffer=1000, max_concurrent_chunks=100, inner_request_size=10_000, output_dir="data_fast", contract=["0x03506eD3f57892C85DB20C36846e9c808aFe9ef4"], hex=True )

davidgasquez commented 8 months ago

Woah! I did try with higher max_concurrent_chunks but didn't get any speedup locally... interesting!

while TXs need some thinking, if performance inside CI-runner is comparable, event-based assets seem feasible now

:rocket:

DistributedDoge commented 7 months ago

Just leaving a note that tx data from Covalent is quite neat for analyzing cost side, as it already has dolarized amounts for actual gas cost.

Unfortunately, the fetch is a bit on the longer side. Figuring out the incremental part could help save a lot of time and API credits (that we still have aplenty).

davidgasquez commented 7 months ago

I think total gas cost of mainnet transactions dealing with grants stack project profiles was $23k for about 2.3k operations.

Nice! Would be awesome to publish a report inside Quarto analyzing the new data and showing the process to derive these numbers.

Unfortunately, the fetch is a bit on the longer side. Figuring out the incremental part could help save a lot of time and API credits (that we still have aplenty). Free API key request limit of 4/second => need to limit parallel runs for assets of that type 3 minutes to pull 2.3k events in pages of 100 isn't that impressive

Understandable. Really need to think harder about #28. Meanwhile, we can always do it slow. GitHub actions errors out after... 6 hours I think. :man_shrugging:

davidgasquez commented 7 months ago

I'm keeping an eye on mesc and its integration with Cryo. I think there might be a simple approach to get data from multiple chains easily. Probably slower than Covalent, except if we do partitions + incremental!