apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
63.08k stars 13.97k forks source link

Chart cache warmup fails with 405 Method Not Allowed #28705

Closed rmasters closed 2 months ago

rmasters commented 6 months ago

Bug description

We found our chart cache warm-up task (as in the example in the Kubernetes docs) was failing with werkzeug.exceptions.MethodNotAllowed: 405 Method Not Allowed.

I found a similar mention of this problem on Slack: https://apache-superset.slack.com/archives/C01SS4DNYG5/p1715183775388149

The root cause is that 56069b05f9cf4d0c725d1b4b0ad6038b50837cd4 accidentally partially-reverted the warm-caches URL, so we have a PUT to the deprecated GET endpoint.

I have a fix for this, which I will PR shortly (#28706).

How to reproduce the bug

  1. Ensure you have a cache-warmup task scheduled as below
  2. Ensure you have at least one dashboard with a data-backed chart
  3. Monitor the celery worker logs

Screenshots/recordings

No response

Superset version

4.0.1

Python version

3.10

Node version

Not applicable

Browser

Not applicable

Additional context

Logs/stacktrace:

[2024-05-24 22:30:00,045: INFO/ForkPoolWorker-1] cache-warmup[ae85f28c-acfa-4bb1-a885-e7f0121610b9]: Loading strategy
[2024-05-24 22:30:00,045: INFO/ForkPoolWorker-1] cache-warmup[ae85f28c-acfa-4bb1-a885-e7f0121610b9]: Loading TopNDashboardsStrategy
[2024-05-24 22:30:00,046: INFO/ForkPoolWorker-1] cache-warmup[ae85f28c-acfa-4bb1-a885-e7f0121610b9]: Success!
HTTPException
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1823, in full_dispatch_request
    rv = self.dispatch_request()
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1788, in dispatch_request
    self.raise_routing_exception(req)
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1770, in raise_routing_exception
    raise request.routing_exception # type: ignore
File "/usr/local/lib/python3.10/site-packages/flask/ctx.py", line 351, in match_request
    result = self.url_adapter.match(return_rule=True) # type: ignore
File "/usr/local/lib/python3.10/site-packages/werkzeug/routing/map.py", line 619, in match
    raise MethodNotAllowed(valid_methods=list(e.have_match_for)) from None
werkzeug.exceptions.MethodNotAllowed: 405 Method Not Allowed: The method is not allowed for the requested URL.
[repeated]

We run with the following CeleryConfig:

class CeleryConfig(object):
    broker_url = celery_redis_broker_url
    imports = (
        "superset.sql_lab",
        "superset.tasks.scheduler",
    )
    result_backend = celery_redis_result_url
    worker_prefetch_multiplier = 10
    task_acks_late = True
    task_annotations = {
        "sql_lab.get_sql_results": {
            "rate_limit": "100/s",
        },
    }
    beat_schedule = {
        [snip]
        "cache-warmup-hourly": {
            "task": "cache-warmup",
            "schedule": crontab(minute="*/30", hour="*"),
            "kwargs": {
                "strategy_name": "top_n_dashboards",
                "top_n": 10,
                "since": "7 days ago",
            },
        },
    }

Checklist

dosubot[bot] commented 6 months ago

Hey there, @rmasters! I'm a bot designed to help you out with bugs, questions, and becoming a contributor. I'm here to assist you with the issue you've encountered while waiting for a human maintainer. Let's tackle this together!

To fix the "405 Method Not Allowed" error in the chart cache warm-up task for Superset version 4.0.1, ensure that the HTTP method used for the /warm_up_cache endpoint is PUT. The endpoint is defined to accept only PUT requests.

Here is the relevant code snippet from superset/charts/api.py:

@expose("/warm_up_cache", methods=("PUT",))
@protect()
@safe
@statsd_metrics
@event_logger.log_this_with_context(
    action=lambda self, *args, **kwargs: f"{self.__class__.__name__}"
    f".warm_up_cache",
    log_to_statsd=False,
)
def warm_up_cache(self) -> Response:
    """Warm up the cache for the chart.
    ---
    put:
      summary: Warm up the cache for the chart
      description: >-
        Warms up the cache for the chart.
        Note for slices a force refresh occurs.
        In terms of the `extra_filters` these can be obtained from records in the JSON
        encoded `logs.json` column associated with the `explore_json` action.
      requestBody:
        description: >-
          Identifies the chart to warm up cache for, and any additional dashboard or
          filter context to use.
        required: true
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/ChartCacheWarmUpRequestSchema"
      responses:
        200:
          description: Each chart's warmup status
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/ChartCacheWarmUpResponseSchema"
        400:
          $ref: '#/components/responses/400'
        404:
          $ref: '#/components/responses/404'
        500:
          $ref: '#/components/responses/500'
    """
    try:
        body = ChartCacheWarmUpRequestSchema().load(request.json)
    except ValidationError as error:
        return self.response_400(message=error.messages)
    try:
        result = ChartWarmUpCacheCommand(
            body["chart_id"],
            body.get("dashboard_id"),
            body.get("extra_filters"),
        ).run()
        return self.response(200, result=[result])
    except CommandException as ex:
        return self.response(ex.status, message=ex.message)

Ensure that your request to warm up the cache uses the PUT method. For example, using curl:

curl -X PUT -H "Content-Type: application/json" -d '{"chart_id": 1, "dashboard_id": 2, "extra_filters": []}' http://your-superset-url/api/v1/chart/warm_up_cache

Or using Python's requests library:

import requests

url = "http://your-superset-url/api/v1/chart/warm_up_cache"
data = {
    "chart_id": 1,
    "dashboard_id": 2,
    "extra_filters": []
}
headers = {
    "Content-Type": "application/json"
}

response = requests.put(url, json=data, headers=headers)
print(response.status_code, response.json())

Additionally, ensure that the cache warm-up task is correctly configured in superset/tasks/cache.py. The fetch_url function should use the PUT method when making the request:

@celery_app.task(name="fetch_url")
def fetch_url(data: str, headers: dict[str, str]) -> dict[str, str]:
    """
    Celery job to fetch url
    """
    result = {}
    try:
        url = get_url_path("Superset.warm_up_cache")
        logger.info("Fetching %s with payload %s", url, data)
        req = request.Request(
            url, data=bytes(data, "utf-8"), headers=headers, method="PUT"
        )
        response = request.urlopen(  # pylint: disable=consider-using-with
            req, timeout=600
        )
        logger.info(
            "Fetched %s with payload %s, status code: %s", url, data, response.code
        )
        if response.code == 200:
            result = {"success": data, "response": response.read().decode("utf-8")}
        else:
            result = {"error": data, "status_code": response.code}
            logger.error(
                "Error fetching %s with payload %s, status code: %s",
                url,
                data,
                response.code,
            )
    except URLError as err:
        logger.exception("Error warming up cache!")
        result = {"error": data, "exception": str(err)}
    return result

Make sure the request method is PUT to avoid the "405 Method Not Allowed" error [1][2].

#### Details **About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

guga-kudos commented 5 months ago

We are also experiencing this problem here after upgrading from 2.x to 3.1.3. Will the changes be backported? 😃

rusackas commented 2 months ago

Will the changes be backported?

While possible, it's not likely. Release managers are essentially volunteering their time, and are focusing efforts on 4.1 and 5.0 for the moment. I'd recommend upgrading :)