Missing logs_by_topic table

johnayoung commented 2 years ago

Hi all,

When attempting to run the DAGs for the first time, we are unable to see where the "logs_by_topic" table is getting populated for our log event.

WITH parsed_logs AS
(SELECT
    logs.block_timestamp AS block_timestamp
    ,logs.block_number AS block_number
    ,logs.transaction_hash AS transaction_hash
    ,logs.log_index AS log_index
    ,logs.address AS contract_address
    ,`<project-id>-internal.ethereum_<entity>_blockchain_etl.parse_<smart-contract>_event_<event-name>`(logs.data, logs.topics) AS parsed
FROM `<project-id>-internal.crypto_ethereum_partitioned.logs_by_topic_0x8c5` AS logs
WHERE

  address in (lower('<address>'))

  AND topics[SAFE_OFFSET(0)] = '<topic>'

  -- live

  )
SELECT
     block_timestamp
     ,block_number
     ,transaction_hash
     ,log_index
     ,contract_address

    ,parsed.owner AS `owner`
    ,parsed.spender AS `spender`
    ,parsed.value AS `value`
FROM parsed_logs
WHERE parsed IS NOT NULL

The section in question is this guy:

...
FROM `<project-id>-internal.crypto_ethereum_partitioned.logs_by_topic_0x8c5` AS logs
...

We know that this is part of the "LIVE" realtime update section, but what is actually populating the table with the topics that we specify? Is this being done in a different repo?

medvedev1088 commented 2 years ago

It's done in this repo https://github.com/blockchain-etl/blockchain-etl-dataflow/blob/master/partitioned_tables.md

johnayoung commented 2 years ago

Thanks a ton @medvedev1088 for the quick response.

So am I correct in assuming:

live will stream entities (blocks, transactions, etc.) to Pub/Sub using https://github.com/blockchain-etl/blockchain-etl-streaming
then uses https://github.com/blockchain-etl/blockchain-etl-dataflow to connect pub/sub to BigQuery

Are both of these repos setup enough where we can plug and play our own implementation? We plan on contributing to the dataset ecosystem, but need a custom implementation for some edge cases.

medvedev1088 commented 2 years ago

@johnayoung yes those assumptions are correct. The code in the repos is sufficient to set the system up.

On your last point, what datasets are you planning on contributing?

blockchain-etl / ethereum-etl-airflow

Missing logs_by_topic table #316