apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.42k stars 14.36k forks source link

Gantt chart flickering and constant rescaling #42215

Closed adamgorkaextbi closed 1 hour ago

adamgorkaextbi commented 2 months ago

Apache Airflow version

2.10.1

If "Other Airflow 2 version" selected, which one?

No response

What happened?

Gantt chart is flickering due constant rescaling "Queued at" time is computed incorrectly +2h to start and end time of DAG image image

What you think should happen instead?

I should see correct Gantt chart or at lease not flickering

How to reproduce

We migrate from airflow 2.7 to 2.9.3(same Gantt issue) and 2.10.1

Operating System

airflow docker release

Versions of Apache Airflow Providers

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

Helm chart

Anything else?

We migrate from airflow 2.7 to 2.9.3(same Gantt issue) and 2.10.1

Are you willing to submit PR?

Code of Conduct

boring-cyborg[bot] commented 2 months ago

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

Shlomixg commented 2 months ago

Same here, occurs after upgrading from 2.8.2 to 2.10.1

adamgorkaextbi commented 2 months ago

still same issue on 2.10.2 version

adamgorkaextbi commented 2 months ago

nice video in duplicated issue: https://github.com/apache/airflow/issues/42243

width of

's representing bars is changing in infinite loop class name is also change in loop, generating break point on attribute modification lead to this code: image

dannyl1u commented 1 month ago

Hi @adamgorkaextbi, are there any specific steps to reproduce this? I'm looking into this but can't seem to reproduce the issue on my end. Thanks!

adamgorkaextbi commented 1 month ago

@dannyl1u How to reproduce Use airflow official image and helm chart. Database backend is PostgreSQL. Airflow is using UTC time. Server is running at UTC. users work in CEST (+2h versus UTC time). Externally trigger is used to run DAGs. Migrate from airflow 2.7 (that has its own Gantt chart issues) to 2.9.3 (where flickering appear) and then 2.10.1 and then 2.10.2 what in my opinion lead to situation where incorrect values of Queued at (old values and new once) are generated (since airflow 2.7 Queued at has incorrect value +2h after start date, Queued at should occurs before started date) and this may lead Front End to incorrectly and constantly re-compute positions (padding and width) of Tasks on Gantt chart

adamgorkaextbi commented 1 month ago

I have checked and queued_dttm alias Queued at in database has correct value stored (datetime with timezone) I suspect fronted end is processing or receiving queued_dttm that is used and displayed "Queued at" without taking into account time zone for this field this is why on our www we see +2h and why Gantt chart is getting crazy

adamgorkaextbi commented 1 month ago

task details and Gantt chart both use reacts useGridData() to get data

adamgorkaextbi commented 1 month ago
dannyl1u commented 1 month ago

@adamgorkaextbi @Shlomixg I've tried it on my DAGs (also on 2.10.0) and the Gantt charts are fine on my end. Could you please share your DAG file? If it contains sensitive information, a sanitized or similar version would be fine, as long as it still reproduces the Gantt chart problem

adamgorkaextbi commented 1 month ago

@dannyl1u I guess you need to apply airflow migration scripts to reproduce issue.

I check one more time airflow schema after migration before I checked only task_instance table where queued_dttm has correct type timestamptz image but today I also check DAG_run table schema image according to airflow db model queued_at schould be timestamptz @dannyl1u Can you double check schema on your side?

I guess we will try manually change this column type in our DB and let you know if this fix issue. Still I guess one of migration scripts will require fixing

adamgorkaextbi commented 1 month ago

If you are running airflow in + timezone EUROPE/ASIA (USA timezones are not affected with flickering, but still data has incorrect type db) SOLUTION: """ ALTER TABLE .dag_run ALTER COLUMN queued_at TYPE timestamptz USING queued_at AT TIME ZONE ''; """ TIME ZONE VALDIATION SELECT QUERY BEFORE ALTER TABLE: """ SELECT id, dag_id, execution_date, state, run_id, end_date, start_date, queued_at, queued_at at TIME zone '' as queued_at_new FROM dag_run where start_date is not null order by start_date desc limit 100; """

TODO: Correct airflow db migration script with setting correct types of this columns during migration FIX UI react logic to deal with incorrect timestamp order (start_date, queued_at, end_date) or to check timestamp order and report exception instead of flickering or other unexpected behavior Add unittests for processing incorrect data on frontend in case of Gantt Chart

CharlieJ15420 commented 5 days ago

Also getting this issue on 2.10.3. Happens on seemingly random DAGs with the gantt UI timescales constantly flickering.

leetdavid commented 3 days ago

To me, this happens when I have multiple retries for tasks.

sokokoluhumbu commented 3 days ago

this work for me : in paris ALTER TABLE dag_run ALTER COLUMN queued_at TYPE timestamptz USING queued_at AT TIME ZONE 'Europe/Paris';

hongshaoyang commented 2 days ago

Airflow Version: v2.10.0 Git Version: .release:e001b88f5875cfd7e295891a0bbdbc75a3dccbfb Deployment: Official Apache Airflow Helm Cart

https://github.com/user-attachments/assets/a71da12d-a18d-454b-b8a2-8b16fd7db9d5

darkag commented 2 days ago

Same problem on 2.10.3, but as @leetdavid said, it occurs only when there is a retried task. We're using MySQL for the database, so it doesn't seem to be just a timezone issue.

For me, it seems more related to the fact that the Gantt graph attempts to show task executions present in task_instance for a run_id, but within the period between the queued_at and end_date of the dag_run. (If I manually change queued_at to encompass all task_instance, the flickering stops.)

The minimum date for the Gantt graph shouldn't be queued_at but rather the minimum start_date of task_instance

darkag commented 1 day ago

The closed pull request above should solve the problem but since I don't understand what to do the validation error message...

basically this part of the airflow/www/static/js/dag/details/gantt/index.tsx file

// Reset state when the dagrun changes
  useEffect(() => {
    if (startDate !== dagRun?.queuedAt && startDate !== dagRun?.startDate) {
      setStartDate(dagRun?.queuedAt || dagRun?.startDate);
    }
    if (!endDate || endDate !== dagRun?.endDate) {
      // @ts-ignore
      setEndDate(dagRun?.endDate ?? moment().add(1, "s").toString());
    }
  }, [
    dagRun?.queuedAt,
    dagRun?.startDate,
    dagRun?.endDate,
    startDate,
    endDate,
  ]);

is triggered when startDate/endDate and set dagRun?.queuedAt || dagRun?.startDate as start date but doing so it seems to trigger a redraw of the graph which call setGanttDuration for each task instance that could lead to a change of the startDate/endDate entering in an infinite loop of date changes.

my pull request tried to solve the issue by removing the startDate/endDate in the "reset" declaration, it works if I rebuild javascript on my local airflow instance, but doesn't pass the github validation process. It seems that just removing startDate/endDate make the file inconsistent and since my knowledge in reactjs is almost 0, I will let someone with a better understanding fix this error