coderxio / sagerx

Open drug data pipelines curated by pharmacists.
https://coderx.io/sagerx
Other
47 stars 12 forks source link

"PRD" RxNorm Historical errors out with SIGKILL message #307

Open jrlegrand opened 3 months ago

jrlegrand commented 3 months ago

Problem Statement

DAG runs for a while and then errors out. Doesn't do this on my local machine. Brief googling suggests it's memory related, but hard to diagnose. See this: https://github.com/apache/airflow/issues/10435

Error message: [2024-07-10, 11:48:05 UTC] {local_task_job.py:208} INFO - Task exited with return code Negsignal.SIGKILL

Criteria for Success

DAG runs to completion in "PRD".

Additional Information

image
jrlegrand commented 3 months ago

This is also breaking my rxnorm_historical locally

Assuming it is a memory issue: https://medium.com/brexeng/debugging-and-preventing-memory-errors-in-python-e00be55e7cf2

It gets through all API calls and then silently crashes before it exists the concurrent API calls function. Assuming it has to do with ThreadPoolExecutor taking up too much memory somehow.

jrlegrand commented 3 months ago

I think something's happening in the ThreadPoolExecutor. I added some logging and it doesn't seem to get past the for loop before stalling out forever (or giving the SIGKILL message in "PRD").

Image

jrlegrand commented 2 months ago

Need to try asynchronous process. Look into async.io python library.

Suspect once concurrency is up and running with as many threads it can handle, the handoff trying to reassign the threads is causing it to break. Could also be a docker issue - choking on concurrent requests.

jrlegrand commented 4 weeks ago

@NTBTI FYI - this is the bug you ran into.