davidgasquez / gitcoin-grants-data-portal

🌲 Open source, serverless, and local-first data hub for Gitcoin Grants data!
https://grantsdataportal.xyz/
MIT License
26 stars 3 forks source link

Flipside/Dune Integration #38

Open davidgasquez opened 8 months ago

davidgasquez commented 8 months ago

We can push datasets to Flipside and Dune via Dagster assets.

davidgasquez commented 8 months ago

Dune has an upload limit of 1MiB...

We can upload probably some daily metrics but not much more.

@DistributedDoge, do you know how the Flipside external data works?

Seems their live query stuff would work for JSON data but I'm not sure if it can read Parquet files

DistributedDoge commented 8 months ago
  1. From what I see in docs+UI Dune is telling me upload limit is 200MB per .CSV table. See here, no clue if it is accurate though.

  2. Last time I tried, Flipside could not read anything other than JSON, it will most certainly work with ipfs-hosted file though.

EDIT: For 2, to avoid making things messy we could e.g. push a subset of tables as json to different Filebase bucket.

davidgasquez commented 7 months ago

Dune is telling me upload limit is 200MB per .CSV table

I think that might be correct. The limit comes from the Free plan, which limits storage to 1MB.

image

For 2, to avoid making things messy we could e.g. push a subset of tables as json to different Filebase bucket.

That could work! We'll need to check how efficient those queries are though. Querying uncompressed JSON is not very efficient.

DistributedDoge commented 7 months ago

image

Confirmed empirically that Dune does not let me upload 25MB file, bait and switch at its finest.

davidgasquez commented 7 months ago

They're thinking about increasing it though.

I can poke them a bit more and check if that's a "future" plan or actual thing happening in the following weeks/months.

davidgasquez commented 7 months ago

Left a comment with a potential implementation approach.

DistributedDoge commented 7 months ago

I just learned we can in fact select assets using groups, compute kind or other properties.

This means we can probably construct assset job definitions in such a way, that say CI run would only trigger default assets while weekly fetch would trigger everything including actually updating Dune or Flipside.

from dagster import AssetSelection
AssetSelection.groups('group_label')
davidgasquez commented 7 months ago

Sounds like the perfect approach!

davidgasquez commented 6 months ago

For Flipside, a new option can be to push data to a spreadsheet and have Flipside read from it.

davidgasquez commented 3 months ago

Seems like Dune live fetch is now open to all free users!