duneanalytics / spellbook

SQL views for Dune
Other
1.15k stars 1.06k forks source link

Representing native tokens in prices models #6577

Open 0xRobin opened 3 weeks ago

0xRobin commented 3 weeks ago

Representing native tokens in prices models

Native tokens are often annoying to deal with when pulling in price info. I'll describe the current state with it's problems and solutions, and what I think should be the desired state.

1. Current state

Currently the native tokens are defined here: https://github.com/duneanalytics/spellbook/blob/d948aee66f5f8b7a89882b45a7d70117d56a449b/dbt_subprojects/tokens/models/prices/prices_native_tokens.sql#L34-L37

They have blockchain, contract_address and decimals as null values.

This is also how they show up in prices.usd (dune query) image

Problems and current solutions

When you have a model that includes trades from both erc20 as native tokens, you have to adapt your query to deal with this. Most solutions follow any of these setups

  1. replace the native rows with a wrapped alternative

    select * 
    from (
    select
       blockchain
      ,tx_hash
      ,case when currency_contract = 0x0000000000000000000000000000000000000000 
         then 0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2 
         else currency_contract 
       end as currency_contract
      ,amount
      from trades
    )
    left join prices.usd
    on blockchain = blockchain
    and contract_address = currency_contract

    here we end up using the wrong price feed (WETH instead of ETH), this logic will break whenever there's a depeg event and we also end up with the wrong symbol in our end table.

  2. Add extra logic in the join condition

    select
    blockchain
    ,tx_hash
    ,currency_contract
    ,amount
    ,price
    from trades
    left join prices.usd
    on (blockchain = blockchain 
    and currency_contract = contract_address)
    or (currency_contract = 0x0000000000000000000000000000000000000000
    and blockchain is null
    and symbol = 'ETH')

    This works fine, but requiring blockchain null is confusing, and we're relying solely on the symbol column here to determine the price feed, which is a very unsafe column that can hold arbitrary data. This is the current default that I've seen the most. a note from the prices beta announcement reiterates the danger of relying on the symbol column:

Prices are calculated at the contract_address and blockchain level. Token symbol is not a unique identifier as different tokens may have the same symbol.

  1. Patch the prices model so it integrates better

https://github.com/duneanalytics/spellbook/blob/d948aee66f5f8b7a89882b45a7d70117d56a449b/dbt_subprojects/nft/macros/enrich_nft_trades.sql#L10-L35

This is mostly something I've been doing, haven't seen it adopted elsewhere.

  1. Join in a prices table for the native token seperately https://github.com/duneanalytics/spellbook/blob/d948aee66f5f8b7a89882b45a7d70117d56a449b/dbt_subprojects/nft/macros/nft_mints.sql#L92 https://github.com/duneanalytics/spellbook/blob/d948aee66f5f8b7a89882b45a7d70117d56a449b/dbt_subprojects/nft/macros/nft_mints.sql#L124-L130 https://github.com/duneanalytics/spellbook/blob/d948aee66f5f8b7a89882b45a7d70117d56a449b/dbt_subprojects/nft/macros/nft_mints.sql#L138-L143

Conclusion: joining prices with native tokens can be tricky and is very subjective to produce join duplicates when small errors are made. This has been the subject of many data error investigations in spellbook.

2. Desired state

Prices should be uniquely identified by:

and native tokens should follow this rule. When this is true any join with prices would simply look like this:

left join prices.usd
on blockchain = blockchain
  and contract_address = currency_address
  and minute = block_time

no extra join logic, no case-when, no coalesce() in your select statement..

My proposal is to add all native tokens in the prices tables with:

If there are different standards for representing the native token as an erc20 address (or precompile addresses on some L2s), we can either specify the contract address for each chain individually, or we can add all representations that makes sense. eg. We could have the ethereum price feed both at 0x00000.. and at 0xeeee... so that users don't have to clean up their data if the source uses a different representation. OR we impose 1 native address per chain to force standardization.

This is a small change in spellbook, and could be quickly implemented. The problem is that this change would break any query that currently follows solution no. 2 described above (which is the one I've seen most).

jeff-dude commented 3 weeks ago

thinking out loud here on prices pipeline and summarizing above:

three different inputs, all write to separate tables, then final union view per level of granularity (minute, hour, day).

table is then clean in terms of level of granularity and consistency of data written to all columns.

downstream:

expectation:

0xBoxer commented 3 weeks ago

The problem is that this change would break any query that currently follows solution no. 2 described above (which is the one I've seen most).

We could implement your desired state and carry forward the existing implementation forward. This would improve the UX for users today and keep more bad queries from being written.

Deprecating the existing rows is going to be very difficult if not impossible without breaking any queries.

jeff-dude commented 3 weeks ago

We could implement your desired state and carry forward the existing implementation forward. This would improve the UX for users today and keep more bad queries from being written.

Deprecating the existing rows is going to be very difficult if not impossible without breaking any queries.

not a bad idea if we want to avoid breaking changes. something like:

maybe there is something i'm not thinking of top of mind, but we could explore this

edit: this may still break one of the options above, after speaking a bit with rob further on it. my suggestion is that we put a task in triage to prioritize which would run a test to see if the above would work or not to resolve all above scenarios.