0xRobin commented 3 weeks ago

Representing native tokens in prices models

Native tokens are often annoying to deal with when pulling in price info. I'll describe the current state with it's problems and solutions, and what I think should be the desired state.

1. Current state

Currently the native tokens are defined here: https://github.com/duneanalytics/spellbook/blob/d948aee66f5f8b7a89882b45a7d70117d56a449b/dbt_subprojects/tokens/models/prices/prices_native_tokens.sql#L34-L37

They have blockchain, contract_address and decimals as null values.

This is also how they show up in prices.usd (dune query)

Problems and current solutions

When you have a model that includes trades from both erc20 as native tokens, you have to adapt your query to deal with this. Most solutions follow any of these setups

replace the native rows with a wrapped alternative

select * 
from (
select
   blockchain
  ,tx_hash
  ,case when currency_contract = 0x0000000000000000000000000000000000000000 
     then 0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2 
     else currency_contract 
   end as currency_contract
  ,amount
  from trades
)
left join prices.usd
on blockchain = blockchain
and contract_address = currency_contract

here we end up using the wrong price feed (WETH instead of ETH), this logic will break whenever there's a depeg event and we also end up with the wrong symbol in our end table.

Add extra logic in the join condition
```
select
blockchain
,tx_hash
,currency_contract
,amount
,price
from trades
left join prices.usd
on (blockchain = blockchain 
and currency_contract = contract_address)
or (currency_contract = 0x0000000000000000000000000000000000000000
and blockchain is null
and symbol = 'ETH')
```
This works fine, but requiring blockchain null is confusing, and we're relying solely on the symbol column here to determine the price feed, which is a very unsafe column that can hold arbitrary data. This is the current default that I've seen the most. a note from the prices beta announcement reiterates the danger of relying on the symbol column:

Prices are calculated at the contract_address and blockchain level. Token symbol is not a unique identifier as different tokens may have the same symbol.

Patch the prices model so it integrates better

https://github.com/duneanalytics/spellbook/blob/d948aee66f5f8b7a89882b45a7d70117d56a449b/dbt_subprojects/nft/macros/enrich_nft_trades.sql#L10-L35

This is mostly something I've been doing, haven't seen it adopted elsewhere.

Join in a prices table for the native token seperately https://github.com/duneanalytics/spellbook/blob/d948aee66f5f8b7a89882b45a7d70117d56a449b/dbt_subprojects/nft/macros/nft_mints.sql#L92 https://github.com/duneanalytics/spellbook/blob/d948aee66f5f8b7a89882b45a7d70117d56a449b/dbt_subprojects/nft/macros/nft_mints.sql#L124-L130 https://github.com/duneanalytics/spellbook/blob/d948aee66f5f8b7a89882b45a7d70117d56a449b/dbt_subprojects/nft/macros/nft_mints.sql#L138-L143

Conclusion: joining prices with native tokens can be tricky and is very subjective to produce join duplicates when small errors are made. This has been the subject of many data error investigations in spellbook.

2. Desired state

Prices should be uniquely identified by:

blockchain
contract_address
timestamp

and native tokens should follow this rule. When this is true any join with prices would simply look like this:

left join prices.usd
on blockchain = blockchain
  and contract_address = currency_address
  and minute = block_time

no extra join logic, no case-when, no coalesce() in your select statement..

My proposal is to add all native tokens in the prices tables with:

0x0000000000000000000000000000000000000000 as contract_address
blockchain filled in correctly
decimals filled in correctly

If there are different standards for representing the native token as an erc20 address (or precompile addresses on some L2s), we can either specify the contract address for each chain individually, or we can add all representations that makes sense. eg. We could have the ethereum price feed both at 0x00000.. and at 0xeeee... so that users don't have to clean up their data if the source uses a different representation. OR we impose 1 native address per chain to force standardization.

This is a small change in spellbook, and could be quickly implemented. The problem is that this change would break any query that currently follows solution no. 2 described above (which is the one I've seen most).

jeff-dude commented 3 weeks ago

thinking out loud here on prices pipeline and summarizing above:

trusted tokens sql file, continue current approach and scale per chain added, reads from coinpaprika to get price
native tokens sql file, modify existing to include contract_address, symbol, decimals per chain
- note: some chains could have multiple rows (unique address), as noted above
- new pipeline which generates prices for all other tokens (WIP pipeline)

three different inputs, all write to separate tables, then final union view per level of granularity (minute, hour, day).

table is then clean in terms of level of granularity and consistency of data written to all columns.

downstream:

all joins to prices is now the same for any scenario
while change to native tokens is simple in the spell itself, we need to measure the impact downstream (both spells joining prices in various ways + queries on the dune app)
- how to handle if large impact on queries?

expectation:

blockchain team will gather info from chains as they onboard onto dune to ensure we apply native token data correctly

0xBoxer commented 3 weeks ago

The problem is that this change would break any query that currently follows solution no. 2 described above (which is the one I've seen most).

We could implement your desired state and carry forward the existing implementation forward. This would improve the UX for users today and keep more bad queries from being written.

Deprecating the existing rows is going to be very difficult if not impossible without breaking any queries.

jeff-dude commented 3 weeks ago

We could implement your desired state and carry forward the existing implementation forward. This would improve the UX for users today and keep more bad queries from being written.

Deprecating the existing rows is going to be very difficult if not impossible without breaking any queries.

not a bad idea if we want to avoid breaking changes. something like:

keep native tokens file as-is, but add another row to populate all fields such as blockchain, address, etc etc
technically we are duplicating the data, which would grow the table size further (it's about to explode in size anyway)
while we duplicate the data, the unique columns would remain unique, so joins wouldn't break (both old and new)

maybe there is something i'm not thinking of top of mind, but we could explore this

edit: this may still break one of the options above, after speaking a bit with rob further on it. my suggestion is that we put a task in triage to prioritize which would run a test to see if the above would work or not to resolve all above scenarios.

duneanalytics / spellbook