coderxio / sagerx

Open drug data pipelines curated by pharmacists.
https://coderx.io/sagerx
Other
45 stars 12 forks source link

RxNorm DAG returns dbt error about NDC mart #263

Closed jrlegrand closed 4 months ago

jrlegrand commented 7 months ago

Problem Statement

RxNorm DAG failed with error about a mart dependency while running staging table and intermediate table marts for RxNorm. I have no idea why it would be attempting to check dependencies for a mart since it's not --selecting any marts to run.

Criteria for Success

RxNorm DAG runs to completion independent of any marts.

Additional Information

I WAS running other dbt commands for these marts around the time this RxNorm DAG was running. Not sure if that would cause errors in Airflow.

Error message:

[2024-03-06, 03:17:34 UTC] {subprocess.py:75} INFO - Running command: ['dbt', 'run', '--select', 'models/staging/rxnorm', 'models/intermediate/rxnorm']
[2024-03-06, 03:17:34 UTC] {subprocess.py:86} INFO - Output:
[2024-03-06, 03:17:37 UTC] {subprocess.py:93} INFO - 03:17:36  Running with dbt=1.4.1
[2024-03-06, 03:17:37 UTC] {subprocess.py:93} INFO - 03:17:37  Encountered an error:
[2024-03-06, 03:17:37 UTC] {subprocess.py:93} INFO - Compilation Error
[2024-03-06, 03:17:37 UTC] {subprocess.py:93} INFO -   Model 'model.sagerx.all_ndcs_to_sources' (models/marts/ndc/all_ndcs_to_sources.sql) depends on a node named 'stg_rxnorm_historical__ndcs' which was not found
[2024-03-06, 03:17:37 UTC] {subprocess.py:97} INFO - Command exited with return code 2
[2024-03-06, 03:17:37 UTC] {logging_mixin.py:137} INFO - Result from dbt: SubprocessResult(exit_code=2, output="  Model 'model.sagerx.all_ndcs_to_sources' (models/marts/ndc/all_ndcs_to_sources.sql) depends on a node named 'stg_rxnorm_historical__ndcs' which was not found")
[2024-03-06, 03:17:37 UTC] {python.py:177} INFO - Done. Returned value was: None
[2024-03-06, 03:17:37 UTC] {taskinstance.py:1323} INFO - Marking task as SUCCESS. dag_id=rxnorm, task_id=transform, execution_date=20240306T031208, start_date=20240306T031734, end_date=20240306T031737
[2024-03-06, 03:17:37 UTC] {local_task_job.py:208} INFO - Task exited with return code 0
[2024-03-06, 03:17:37 UTC] {taskinstance.py:2578} INFO - 0 downstream tasks scheduled from follow-on schedule check
lprzychodzien commented 5 months ago

I dont think this is a bug, it feels like a dbt conflict as you were trying to run some commands together. I recently ran Rxnorm from both a preliminary run and a refresh run and i was not able to recreate your issue. You are right that model 'model.sagerx.all_ndcs_to_sources' should not be (and is not) part of the rxnorm DAG.

jrlegrand commented 4 months ago

Agreed - closing until we see this again. I haven't seen it since the first time.