apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
62.62k stars 13.82k forks source link

Report Celery Job is always scheduled 12 hours ahead #27952

Open nathan-gilbert opened 7 months ago

nathan-gilbert commented 7 months ago

Bug description

Problem is the title. No matter what timezone I select, the report job gets scheduled 12 hours ahead.

In the Superset UI, the timezone display on the created report job is always different than what I selected in the UI (which is typically, GMT -6): image

In flower, I can see the jobs and their ETA looks correct but the job doesn't get executed then, but exactly 12 hours later: image

I'm running Superset/Celery on a Ubuntu 22.04, Python 3.11. Not in docker, just directly.

Here's my celery config:

class CeleryConfig:
    """
    Celery worker configuration
    """

    broker_url = f"redis://{REDIS_HOST}:{REDIS_PORT}/0"
    imports = (
        "superset.sql_lab",
        "superset.tasks.scheduler",
    )
    result_backend = f"redis://{REDIS_HOST}:{REDIS_PORT}/0"
    worker_prefetch_multiplier = 10
    task_protocol = 2
    task_acks_late = True
    task_annotations = {
        "sql_lab.get_sql_results": {
            "rate_limit": "100/s",
        },
        "tasks.add": {
            "rate_limit": "10/s",
        },
        "email_reports.send": {
            "rate_limit": "1/s",
            "time_limit": 120,
            "soft_time_limit": 150,
            "ignore_result": True,
        },
    }
    beat_schedule = {
        "reports.scheduler": {
            "task": "reports.scheduler",
            "schedule": crontab(minute="*", hour="*"),
        },
        "reports.prune_log": {
            "task": "reports.prune_log",
            "schedule": crontab(minute="0", hour="0"),
        },
    }
    timezone = "America/Denver"
    enable_utc = False

How to reproduce the bug

Turn on Alerts/reports.

Add new report, select GMT -6 as the timezone. Create the job. See in the worker logs that the job wont execute for another 12 hours.

Screenshots/recordings

Celery worker logs: image

Superset version

3.1.2

Python version

3.11

Node version

Not applicable

Browser

Chrome

Additional context

No response

Checklist

tridoxx commented 6 months ago

hi @nathan-gilbert you found some solutión about this problem? i have the same issue, thanks for you comment

nathan-gilbert commented 6 months ago

@tridoxx No, I have found no solution yet.

dosubot[bot] commented 6 months ago

It looks like the issue you're experiencing with Celery jobs in Superset being scheduled 12 hours ahead could be due to a timezone misalignment, particularly in how timezones are handled and converted within Superset's scheduling logic. This misalignment might stem from the timezone setting in the Superset UI not being correctly recognized or applied when calculating the next execution times for jobs.

To address this, ensure that the timezone you set in the Superset UI is correctly formatted and recognized by the pytz library, as any incorrect or unrecognized timezone will default to UTC. Additionally, you might need to review and possibly customize the scheduling logic in Superset, especially if the default behavior doesn't align with your expectations. This could involve adjusting how datetime values are localized to the specified timezone and ensuring accurate conversion back to UTC if necessary.

There's also a related issue and suggested solution that involves replacing datetime.utcnow() with datetime.now(timezone.utc) in the scheduler code, which has been discussed in the Superset GitHub repository. This change aims to correct the timezone handling and might be a good starting point for resolving the scheduling discrepancies you're facing.

If after these steps you're still encountering issues, expanding logging to include key variables like triggered_at, start_at, stop_at, and the final scheduled times can help identify where the mismatches occur, providing further insight into the problem and aiding in troubleshooting.

#### Details **Sources** - [Timezone Misalignment in Apache Superset 3.0.1 Alert&Reports](https://github.com/apache/superset/issues/26962) - [superset/tasks/scheduler.py](https://github.com/apache/superset/blob/master/superset/tasks/scheduler.py) - [superset/tasks/cron_util.py](https://github.com/apache/superset/blob/master/superset/tasks/cron_util.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

tridoxx commented 6 months ago

hi @nathan-gilbert i did this correctión, first you need to modify the file "superset/task/schedule.py" in my case, i am using docker so, for some strange reason, i need to modify this file inside the "superset_worker" container, not the "superset_app" container. so i edit the to this.

`import pytz utc_now = datetime.utcnow()

and change the line

async_options = {"eta": schedule}

to async_options = {"eta": utc_now} `

something like this.

image

and reboot the machine. for me this work, the superset wil take the real utc time, and subtract the correct UTC time from the timezone defined directly by the alert generator on the superset app.

for me is working now without problem, check that is correct using the crontab on superset, to execute the alert every minute, and use the next comand to check logs "docker logs superset_worker --since 1h" if you are using the docker "superset_worker"