datawaves-xyz / dbt_ethereum_source

MIT License
0 stars 2 forks source link

stage layer #11

Open TalkWIthKeyboard opened 2 years ago

TalkWIthKeyboard commented 2 years ago

Question 1: In which layer type conversion should be done? staging models or source models? Say contract A has 100 call/event models. If type conversion is done in staging models, we will have another 100 models (views likely).

Question 2: Is wyvern_data.sql a staging model (which should reside in dbt_ethereum_source) or a business model?

Question 3: If we need to create a staging table for each event/call table (e.g. punk_transfer / wyvern_data), which path should we put it in? If it resides in the same schema of source models, how can we deal with naming conflicts?

For Question 3, I have figured out a pattern for how we organize those models. It will make the naming of our tables shorter.

First stage: ethereum-etl

node API --[ethereum-etl]-> ethereum.traces

Second stage: dbt-ethereum-source

ethereum.traces --[dbt enrich]-> ethereum_stg.traces --[dbt decode]-> ethereum_opensea.wvyern_call_atomicmatch --[dbt enrich]-> ethereum_opensea_stg.atomic_match

Third stage: dbt-datawaves

ethereum_opensea_stg.atomic_match --[dbt business]-> opensea.trades --[dbt business]-> nft.trades

In the business dbt repo, we don't add a chain name as a prefix or use it as a database name, and the project should be renamed into dbt_datawaves or something because the modeled data are chain free. (Considering that, we can further rename dbt-ethereum-source to dbt-ethereum.)

I notice dune v2 also takes this approach. They use column blockchain to differentiate chains. All chain records combine into one table.

image

If opensea.trades view is general, we may open-source another project dbt-opensea for ease of maintenance.

Since dbt requires models to have unique names. opensea.trades and nft.trades must be achieved using alias and overwriting custom_schema. But this will make test hard. Since we don't utilize target_schema field.

Another pattern I suggest is to use some alias like core or event:

ethereum_opensea_stg.atomic_match --[dbt business]-> event.opensea_trades --[dbt business]-> event.nft_trades

onesuper commented 2 years ago

Modified some contents. I have moved ethereum_stg.traces to the first stage since it is more source-centric and closer to the existing pipeline.

First stage: ethereum-etl

node API --[ethereum-etl]-> ethereum.traces --[dbt enrich]-> ethereum_stg.traces

Second stage: dbt-ethereum-source

ethereum_stg.traces --[dbt decode]-> ethereum_opensea.wvyern_call_atomicmatch --[dbt enrich]-> ethereum_opensea_stg.atomic_match

Third stage: dbt-datawaves

ethereum_opensea_stg.atomic_match --[dbt business]-> opensea.trades --[dbt business]-> nft.trades