blockchain-etl / ethereum-etl-airflow

Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethereum-smart-contract-into-bigquery-in-8-mins-bab5db1fdeee
MIT License
406 stars 192 forks source link

Missing `blocks` table when running first time #390

Open r4881t opened 2 years ago

r4881t commented 2 years ago

Hi,

I followed the steps for GCP and created the project and datasets. When running for first time, I get the error in ethereum_verify_streaming_dag

[2022-07-27 03:40:24,285] {taskinstance.py:670} INFO - Dependencies all met for <TaskInstance: ethereum_verify_streaming_dag.verify_blocks_have_latest 2022-07-27T02:30:00+00:00 [queued]>
[2022-07-27 03:40:24,356] {taskinstance.py:670} INFO - Dependencies all met for <TaskInstance: ethereum_verify_streaming_dag.verify_blocks_have_latest 2022-07-27T02:30:00+00:00 [queued]>
[2022-07-27 03:40:24,356] {taskinstance.py:880} INFO - 
--------------------------------------------------------------------------------
[2022-07-27 03:40:24,356] {taskinstance.py:881} INFO - Starting attempt 6 of 6
[2022-07-27 03:40:24,356] {taskinstance.py:882} INFO - 
--------------------------------------------------------------------------------
[2022-07-27 03:40:24,392] {taskinstance.py:901} INFO - Executing <Task(BigQueryOperator): verify_blocks_have_latest> on 2022-07-27T02:30:00+00:00
[2022-07-27 03:40:24,406] {standard_task_runner.py:54} INFO - Started process 1648 to run task
[2022-07-27 03:40:24,718] {standard_task_runner.py:77} INFO - Running: ['airflow', 'run', 'ethereum_verify_streaming_dag', 'verify_blocks_have_latest', '2022-07-27T02:30:00+00:00', '--job_id', '33761', '--pool', 'default_pool', '--raw', '-sd', '/opt/airflow/dags/repo/dags/ethereum_verify_streaming_dag.py', '--cfg_path', '/tmp/tmp5tv7mx2g']
[2022-07-27 03:40:24,719] {standard_task_runner.py:78} INFO - Job 33761: Subtask verify_blocks_have_latest
[2022-07-27 03:40:24,884] {logging_mixin.py:120} INFO - Running <TaskInstance: ethereum_verify_streaming_dag.verify_blocks_have_latest 2022-07-27T02:30:00+00:00 [running]> on host airflow-worker-0.airflow-worker.etlapp.svc.cluster.local
[2022-07-27 03:40:25,024] {bigquery_operator.py:252} INFO - Executing: select if(
(
select timestamp_diff(
  current_timestamp(),
  (select max(timestamp)
  from `elaborate-baton-357506.crypto_ethereum.blocks` as blocks
  where date(timestamp) >= date_add('2022-07-27', INTERVAL -1 DAY)),
  MINUTE)
) < 1, 1,
cast((select 'Blocks are lagging by more than 1 minutes') as INT64))
[2022-07-27 03:40:25,953] {taskinstance.py:1150} ERROR - BigQuery job failed. Final error was: {'reason': 'notFound', 'message': 'Not found: Table elaborate-baton-357506:crypto_ethereum.blocks was not found in location asia-south1'}. The job was: {'kind': 'bigquery#job', 'etag': 'Q5nYP18pPfEi2HSIjDP7sg==', 'id': 'elaborate-baton-357506:asia-south1.job_kbdzE9BYIJP-ueO-O_trlXAuvRcw', 'selfLink': 'https://bigquery.googleapis.com/bigquery/v2/projects/elaborate-baton-357506/jobs/job_kbdzE9BYIJP-ueO-O_trlXAuvRcw?location=asia-south1', 'user_email': 'etlapp@elaborate-baton-357506.iam.gserviceaccount.com', 'configuration': {'query': {'query': "select if(\n(\nselect timestamp_diff(\n  current_timestamp(),\n  (select max(timestamp)\n  from `elaborate-baton-357506.crypto_ethereum.blocks` as blocks\n  where date(timestamp) >= date_add('2022-07-27', INTERVAL -1 DAY)),\n  MINUTE)\n) < 1, 1,\ncast((select 'Blocks are lagging by more than 1 minutes') as INT64))", 'priority': 'INTERACTIVE', 'useLegacySql': False}, 'jobType': 'QUERY'}, 'jobReference': {'projectId': 'elaborate-baton-357506', 'jobId': 'job_kbdzE9BYIJP-ueO-O_trlXAuvRcw', 'location': 'asia-south1'}, 'statistics': {'creationTime': '1658893225735', 'startTime': '1658893225826', 'endTime': '1658893225826'}, 'status': {'errorResult': {'reason': 'notFound', 'message': 'Not found: Table elaborate-baton-357506:crypto_ethereum.blocks was not found in location asia-south1'}, 'errors': [{'reason': 'notFound', 'message': 'Not found: Table elaborate-baton-357506:crypto_ethereum.blocks was not found in location asia-south1'}], 'state': 'DONE'}}
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 984, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/contrib/operators/bigquery_operator.py", line 262, in execute
    job_id = self.bq_cursor.run_query(
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/contrib/hooks/bigquery_hook.py", line 915, in run_query
    return self.run_with_configuration(configuration)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/contrib/hooks/bigquery_hook.py", line 1347, in run_with_configuration
    raise Exception(
Exception: BigQuery job failed. Final error was: {'reason': 'notFound', 'message': 'Not found: Table elaborate-baton-357506:crypto_ethereum.blocks was not found in location asia-south1'}. The job was: {'kind': 'bigquery#job', 'etag': 'Q5nYP18pPfEi2HSIjDP7sg==', 'id': 'elaborate-baton-357506:asia-south1.job_kbdzE9BYIJP-ueO-O_trlXAuvRcw', 'selfLink': 'https://bigquery.googleapis.com/bigquery/v2/projects/elaborate-baton-357506/jobs/job_kbdzE9BYIJP-ueO-O_trlXAuvRcw?location=asia-south1', 'user_email': 'etlapp@elaborate-baton-357506.iam.gserviceaccount.com', 'configuration': {'query': {'query': "select if(\n(\nselect timestamp_diff(\n  current_timestamp(),\n  (select max(timestamp)\n  from `elaborate-baton-357506.crypto_ethereum.blocks` as blocks\n  where date(timestamp) >= date_add('2022-07-27', INTERVAL -1 DAY)),\n  MINUTE)\n) < 1, 1,\ncast((select 'Blocks are lagging by more than 1 minutes') as INT64))", 'priority': 'INTERACTIVE', 'useLegacySql': False}, 'jobType': 'QUERY'}, 'jobReference': {'projectId': 'elaborate-baton-357506', 'jobId': 'job_kbdzE9BYIJP-ueO-O_trlXAuvRcw', 'location': 'asia-south1'}, 'statistics': {'creationTime': '1658893225735', 'startTime': '1658893225826', 'endTime': '1658893225826'}, 'status': {'errorResult': {'reason': 'notFound', 'message': 'Not found: Table elaborate-baton-357506:crypto_ethereum.blocks was not found in location asia-south1'}, 'errors': [{'reason': 'notFound', 'message': 'Not found: Table elaborate-baton-357506:crypto_ethereum.blocks was not found in location asia-south1'}], 'state': 'DONE'}}
[2022-07-27 03:40:25,956] {taskinstance.py:1187} INFO - Marking task as FAILED. dag_id=ethereum_verify_streaming_dag, task_id=verify_blocks_have_latest, execution_date=20220727T023000, start_date=20220727T034024, end_date=20220727T034025
[2022-07-27 03:40:29,224] {local_task_job.py:102} INFO - Task exited with return code 1

I have the following DAGs running

Screenshot 2022-07-27 at 9 16 47 AM

What step prepared the tables?

araa47 commented 2 years ago

The load_dag should be the dag that prepared this table by loading the exported data from gcs to BigQuery.