datarootsio / terraform-aws-ecs-airflow

A terraform module that creates an airflow instance in AWS ECS.
MIT License
58 stars 40 forks source link

Scheduler not running #38

Closed nyawanga closed 2 years ago

nyawanga commented 2 years ago

The scheduler does not appear to be running. Last heartbeat was received 5 hours ago. The DAGs list may not update, and new tasks will not be scheduled.

I keep getting the above error even though I tried to set my scheduler to restart every hour as suggested by other sources. Is there a better way to handle this issue?

dpfeif commented 2 years ago

Hi @nyawanga , Can you please share the module version and parameters you're using? Also, has the setup ever worked for a period of time and then stopped or was it dead from the start?

nyawanga commented 2 years ago

Hi @nyawanga , Can you please share the module version and parameters you're using? Also, has the setup ever worked for a period of time and then stopped or was it dead from the start?

module "airflow" {

source = "datarootsio/ecs-airflow/aws"

version = "0.0.12"

// airflow airflow_executor = "Local" airflow_image_name = "apache/airflow" airflow_image_tag = "2.1.4" airflow_container_home = "/opt/airflow" airflow_example_dag = false airflow_py_requirements_path = "./orchestration/requirements.txt" airflow_variables = { "AIRFLOWWEBSERVERNAVBAR_COLOR" : "#4099de", "AIRFLOWSCHEDULERRUN_DURATION" : 3600, "AIRFLOWSCHEDULERMIN_FILE_PROCESS_INTERVAL" : 0, "AIRFLOWSCHEDULERDAG_DIR_LIST_INTERVAL" : 60, "AIRFLOWCOREDAG_CONCURRENCY": 3, "AIRFLOWCOREPARALLELISM": 5, "AIRFLOWCOREMAX_ACTIVE_RUNS_PER_DAG": 1,

At first I set AIRFLOW__SCHEDULER__RUN_DURATION to -1 and got the error then set it to 3600 but it appeared again
nyawanga commented 2 years ago

It is actually the second attempt on this in two weeks and getting this error twice now. Is there a way I can set the scheduler health check to be longer with this module?

dpfeif commented 2 years ago

I'm still not sure I understand the issue. Do you mean that it sometimes work and sometimes does not? Can you take a look at the logs of the scheduler on start up? You should see it running a pip install. Also, can you please share the content of you requirements.txt ?

nyawanga commented 2 years ago

this is my second week in implementing airflow, the first time I set it up it took 2 days before getting the stated error. When I researched on the issue most cases pointed to the Executor type to be changed from Sequential to Local which was not the case as I have indicated airflow_executor = "Local" more links such as these https://www.astronomer.io/blog/7-common-errors-to-check-when-debugging-airflow-dag seem to suggest restarting my scheduler regularly but that seems not to have worked as I ran into the same issue yesterday.

My requirements.txt :

sdc-orchestration-helpers[airflow2]==0.4.1 https://pypi.org/project/sdc-helpers/ dag-factory awscli

dpfeif commented 2 years ago

Is that URL actually on the first line of requirements.txt? Because that would trigger an error.

sdc-orchestration-helpers[airflow2]==0.4.1
dag-factory
awscli

Should work just fine

nyawanga commented 2 years ago

So I added the link to show you the source of the sdc_helpers but I use it exactly as you have stated. The airflow instance actually starts and runs fine for a day or two before we get the message below on the dashboard:

The scheduler does not appear to be running. Last heartbeat was received 5 hours ago. The DAGs list may not update, and new tasks will not be scheduled.

dpfeif commented 2 years ago

This looks like an issue in your Airflow configuration / DAG code, not an issue with the deployment and this module. I'm therefore closing it, happy to reopen if you can provide some logs pointing at something wrong here.