blockchain-etl / ethereum-etl-airflow

Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethereum-smart-contract-into-bigquery-in-8-mins-bab5db1fdeee
MIT License
405 stars 190 forks source link

Parsing Debugger #762

Closed araa47 closed 3 months ago

araa47 commented 3 months ago

What?

We sometimes run into issues where the contract parser doesn't seem to be generating the correct files we need. In these cases its currently hard to debug things since we have no feedback that the parsing will work as intented.

[x] Log Parsing Support [ ] Trace Parsing Support (WIP)

How?

I have created a generate_parse_sql.py file that can be called with the path of the json file + date.

Command

The command below tries to verify the json for ajna_v2 ERC721PoolFactory_event_PoolCreated events. Note i selected 2024-01-08 since I know there are events produced on that date from etherscan

python3 generate_parse_sql.py dags/resources/stages/parse/table_definitions/ajna_v2/ERC721PoolFactory_event_PoolCreated.json 2024-01-08

Output

WITH
abi AS 
(
SELECT
    JSON_QUERY(json_data, '$.parser.abi') AS abi,
    JSON_QUERY(json_data, '$.parser.field_mapping') AS field_mapping,
    JSON_QUERY(json_data, '$.table') AS table_details,
    JSON_EXTRACT_SCALAR(json_data, '$.parser.contract_address') AS contract_address,
    CAST(JSON_EXTRACT_SCALAR(json_data, '$.parser.type') AS STRING) AS parser_type
FROM (
    SELECT '{"parser": {"abi": {"anonymous": false, "inputs": [{"indexed": false, "internalType": "address", "name": "pool_", "type": "address"}, {"indexed": false, "internalType": "bytes32", "name": "subsetHash_", "type": "bytes32"}], "name": "PoolCreated", "type": "event"}, "contract_address": "0x27461199d3b7381de66a85d685828e967e35af4c", "field_mapping": {}, "type": "log"}, "table": {"dataset_name": "ajna_v2", "schema": [{"description": "", "name": "pool_", "type": "STRING"}, {"description": "", "name": "subsetHash_", "type": "STRING"}], "table_description": "", "table_name": "ERC721PoolFactory_event_PoolCreated"}}' AS json_data
)
),

details AS (
    SELECT 
        '0xee1fe091a5213b321c2662b35c0b7cd0d35d10dbcab52b3c9b8768983c67bce3' AS sig,
        abi.*
    FROM abi
),

logs AS (
  SELECT
    l.*,
    a.sig, 
    `blockchain-etl-internal.common.parse_log`(
            l.data,
            l.topics,
            REPLACE(a.abi, "'", '"')
        ) AS parsed_log
  FROM
    `bigquery-public-data.crypto_ethereum.logs` AS l
  INNER JOIN
        details AS a
    ON
        IFNULL(l.topics[SAFE_OFFSET(0)], "") = a.sig
  WHERE
    DATE(l.block_timestamp) = DATE("2024-01-08")
    AND (a.contract_address IS NULL OR l.address = LOWER(a.contract_address))
)

SELECT * FROM logs
LIMIT 100

This will generate some sql for you to test out the parsing and verify everything is woring as expected

Running SQL in BQ

Screenshot 2024-07-10 at 11 33 05 AM

We can use the parsed_log column to verify everything we need is correctly parsed