Open 0xRobin opened 3 weeks ago
thinking out loud here on prices pipeline and summarizing above:
contract_address, symbol, decimals
per chain
three different inputs, all write to separate tables, then final union view per level of granularity (minute, hour, day).
table is then clean in terms of level of granularity and consistency of data written to all columns.
downstream:
expectation:
The problem is that this change would break any query that currently follows solution no. 2 described above (which is the one I've seen most).
We could implement your desired state and carry forward the existing implementation forward. This would improve the UX for users today and keep more bad queries from being written.
Deprecating the existing rows is going to be very difficult if not impossible without breaking any queries.
We could implement your desired state and carry forward the existing implementation forward. This would improve the UX for users today and keep more bad queries from being written.
Deprecating the existing rows is going to be very difficult if not impossible without breaking any queries.
not a bad idea if we want to avoid breaking changes. something like:
maybe there is something i'm not thinking of top of mind, but we could explore this
edit: this may still break one of the options above, after speaking a bit with rob further on it. my suggestion is that we put a task in triage to prioritize which would run a test to see if the above would work or not to resolve all above scenarios.
Representing native tokens in prices models
Native tokens are often annoying to deal with when pulling in price info. I'll describe the current state with it's problems and solutions, and what I think should be the desired state.
1. Current state
Currently the native tokens are defined here: https://github.com/duneanalytics/spellbook/blob/d948aee66f5f8b7a89882b45a7d70117d56a449b/dbt_subprojects/tokens/models/prices/prices_native_tokens.sql#L34-L37
They have
blockchain
,contract_address
anddecimals
asnull
values.This is also how they show up in
prices.usd
(dune query)Problems and current solutions
When you have a model that includes trades from both erc20 as native tokens, you have to adapt your query to deal with this. Most solutions follow any of these setups
replace the native rows with a wrapped alternative
here we end up using the wrong price feed (WETH instead of ETH), this logic will break whenever there's a depeg event and we also end up with the wrong symbol in our end table.
Add extra logic in the join condition
This works fine, but requiring blockchain
null
is confusing, and we're relying solely on the symbol column here to determine the price feed, which is a very unsafe column that can hold arbitrary data. This is the current default that I've seen the most. a note from the prices beta announcement reiterates the danger of relying on the symbol column:https://github.com/duneanalytics/spellbook/blob/d948aee66f5f8b7a89882b45a7d70117d56a449b/dbt_subprojects/nft/macros/enrich_nft_trades.sql#L10-L35
This is mostly something I've been doing, haven't seen it adopted elsewhere.
Conclusion: joining prices with native tokens can be tricky and is very subjective to produce join duplicates when small errors are made. This has been the subject of many data error investigations in spellbook.
2. Desired state
Prices should be uniquely identified by:
blockchain
contract_address
timestamp
and native tokens should follow this rule. When this is true any join with prices would simply look like this:
no extra join logic, no case-when, no coalesce() in your select statement..
My proposal is to add all native tokens in the prices tables with:
0x0000000000000000000000000000000000000000
ascontract_address
blockchain
filled in correctlydecimals
filled in correctlyIf there are different standards for representing the native token as an erc20 address (or precompile addresses on some L2s), we can either specify the contract address for each chain individually, or we can add all representations that makes sense. eg. We could have the ethereum price feed both at
0x00000..
and at0xeeee...
so that users don't have to clean up their data if the source uses a different representation. OR we impose 1 native address per chain to force standardization.This is a small change in spellbook, and could be quickly implemented. The problem is that this change would break any query that currently follows solution no. 2 described above (which is the one I've seen most).