apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
61.6k stars 13.45k forks source link

4.1.0rc1 celery issue - Received unregistered task of type 'reports.scheduler'. #29708

Closed padbk closed 1 month ago

padbk commented 1 month ago

Bug description

Getting the following every minute on the worker node:

[2024-07-26 09:46:17,568: ERROR/MainProcess] Received unregistered task of type 'reports.scheduler'.
The message has been ignored and discarded.

Did you remember to import the module containing this task?
Or maybe you're using relative imports?

Please see
https://docs.celeryq.dev/en/latest/internals/protocol.html
for more information.

The full contents of the message body was:
b'[[], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]' (77b)

The full contents of the message headers:
{'lang': 'py', 'task': 'reports.scheduler', 'id': '2ef01d9c-99ce-47e5-92b5-aeccc773aa66', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': '2ef01d9c-99ce-47e5-92b5-aeccc773aa66', 'parent_id': None, 'argsrepr': '()', 'kwargsrepr': '{}', 'origin': 'gen41@superset-celerybeat-xxxxxx', 'ignore_result': False, 'replaced_task_nesting': 0, 'stamped_headers': None, 'stamps': {}}

The delivery info for this task is:
{'exchange': '', 'routing_key': 'celery'}
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/celery/worker/consumer/consumer.py", line 659, in on_task_received
    strategy = strategies[type_]
KeyError: 'reports.scheduler'

And no reports run at all.

I don't have this issue in 4.0.2 with the same config

How to reproduce the bug

Installed 4.1.0rc1-py310 on k8s

class CeleryConfig:
      broker_url = 'rediss://%s:%s/%s?ssl_cert_reqs=CERT_NONE' % (REDIS_HOST, REDIS_PORT, REDIS_CELERY_DB)
      imports = ('superset.sql_lab', "superset.tasks", "superset.tasks.thumbnails", )
      result_backend = 'rediss://%s:%s/%s?ssl_cert_reqs=CERT_NONE' % (REDIS_HOST, REDIS_PORT, REDIS_CELERY_DB)
      worker_log_level = "DEBUG"
      worker_prefetch_multiplier = 4
      task_acks_late = True
      task_annotations = {
          "sql_lab.get_sql_results": {"rate_limit": "100/s"},
          "email_reports.send": {
              "rate_limit": "1/s",
              "time_limit": int(timedelta(seconds=600).total_seconds()),
              "soft_time_limit": int(timedelta(seconds=600).total_seconds()),
              "ignore_result": True,
          },
      }
      beat_schedule = {
          "reports.scheduler": {
              "task": "reports.scheduler",
              "schedule": 59.95,
          },
          "reports.prune_log": {
              "task": "reports.prune_log",
              "schedule": crontab(minute=0, hour=0),
          },
          'cache-warmup-hourly': {
                'task': 'cache-warmup',
                'schedule': crontab(minute=26, hour='*'),  # @hourly
                'kwargs': {
                    'strategy_name': 'top_n_dashboards',
                    'top_n': 20,
                    'since': '7 days ago',
                },
            },
      }

    CELERY_CONFIG = CeleryConfig

Screenshots/recordings

No response

Superset version

master / latest-dev

Python version

3.10

Node version

Not applicable

Browser

Not applicable

Additional context

No response

Checklist

dosubot[bot] commented 1 month ago

Hey there, @padbk! 👋 I'm here to assist with any bugs, questions, or contributions you have regarding the repository. Let's tackle this together!

To resolve the issue with Celery 4.1.0rc1 on Kubernetes where you receive an error about an unregistered task 'reports.scheduler' every minute on the worker node, you need to ensure that the task is properly registered in your Celery configuration. The error indicates that the task 'reports.scheduler' is not recognized by the worker, which suggests that it might not be included in the imports section of your Celery configuration.

Here is an example of how you can configure your Celery settings to include the 'reports.scheduler' task:

configOverrides:
  celery_conf: |
    from celery.schedules import crontab
    class CeleryConfig:
      broker_url = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0"
      imports = (
          "superset.sql_lab",
          "superset.tasks.cache",
          "superset.tasks.scheduler",  # Ensure this line is included
      )
      result_backend = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0"
      task_annotations = {
          "sql_lab.get_sql_results": {
              "rate_limit": "100/s",
          },
      }
      beat_schedule = {
          "reports.scheduler": {
              "task": "reports.scheduler",
              "schedule": crontab(minute="*", hour="*"),
          },
          "reports.prune_log": {
              "task": "reports.prune_log",
              'schedule': crontab(minute=0, hour=0),
          },
          'cache-warmup-hourly': {
              "task": "cache-warmup",
              "schedule": crontab(minute="*/30", hour="*"),
              "kwargs": {
                  "strategy_name": "top_n_dashboards",
                  "top_n": 10,
                  "since": "7 days ago",
              },
          }
      }
    CELERY_CONFIG = CeleryConfig

Make sure that the superset.tasks.scheduler module is correctly specified in the imports section. This should resolve the issue of the unregistered task error. Additionally, ensure that the reports.scheduler task is defined in your codebase as shown below:

from superset.extensions import celery_app

@celery_app.task(name="reports.scheduler")
def scheduler() -> None:
    """
    Celery beat main scheduler for reports
    """
    # Task implementation

This should ensure that the task is recognized and properly registered by the Celery worker [1][2].

To continue talking to Dosu, mention @dosu.

rusackas commented 1 month ago

Please feel free to mention this on the official vote thread if you'd like to draw more attention to it, and see if others are experiencing the same. The release is currently in testing/voting, so it'd be appreciated :) Let me know if you're not on the developer mailing list and I can provide more instructions.

padbk commented 1 month ago

Turns out the imports line I was using was out of date and it needs to be changed. Now mine reads:

imports = ("superset.sql_lab", "superset.tasks.scheduler", "superset.tasks.thumbnails", "superset.tasks.cache", )

Not sure what has changed to make "superset.tasks" no longer work in 4.1.0, but the docs seem up to date.

sfirke commented 1 month ago

I had the same issue and it went away after making that one-line change you suggest. Thanks! I have been using this same config without issues since 2.0.0, I guess something changed for 4.1.0 that made it finally not work.

mistercrunch commented 1 month ago

I noticed warnings around this in docker-compose last week, and connected this issue when looking at 4.1 blockers.

I think I should be fixing the root cause here -> https://github.com/apache/superset/pull/29862