apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
60.86k stars 13.18k forks source link

Chart cache warmup fails with 405 Method Not Allowed #28705

Open rmasters opened 2 months ago

rmasters commented 2 months ago

Bug description

We found our chart cache warm-up task (as in the example in the Kubernetes docs) was failing with werkzeug.exceptions.MethodNotAllowed: 405 Method Not Allowed.

I found a similar mention of this problem on Slack: https://apache-superset.slack.com/archives/C01SS4DNYG5/p1715183775388149

The root cause is that 56069b05f9cf4d0c725d1b4b0ad6038b50837cd4 accidentally partially-reverted the warm-caches URL, so we have a PUT to the deprecated GET endpoint.

I have a fix for this, which I will PR shortly (#28706).

How to reproduce the bug

  1. Ensure you have a cache-warmup task scheduled as below
  2. Ensure you have at least one dashboard with a data-backed chart
  3. Monitor the celery worker logs

Screenshots/recordings

No response

Superset version

4.0.1

Python version

3.10

Node version

Not applicable

Browser

Not applicable

Additional context

Logs/stacktrace:

[2024-05-24 22:30:00,045: INFO/ForkPoolWorker-1] cache-warmup[ae85f28c-acfa-4bb1-a885-e7f0121610b9]: Loading strategy
[2024-05-24 22:30:00,045: INFO/ForkPoolWorker-1] cache-warmup[ae85f28c-acfa-4bb1-a885-e7f0121610b9]: Loading TopNDashboardsStrategy
[2024-05-24 22:30:00,046: INFO/ForkPoolWorker-1] cache-warmup[ae85f28c-acfa-4bb1-a885-e7f0121610b9]: Success!
HTTPException
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1823, in full_dispatch_request
    rv = self.dispatch_request()
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1788, in dispatch_request
    self.raise_routing_exception(req)
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1770, in raise_routing_exception
    raise request.routing_exception # type: ignore
File "/usr/local/lib/python3.10/site-packages/flask/ctx.py", line 351, in match_request
    result = self.url_adapter.match(return_rule=True) # type: ignore
File "/usr/local/lib/python3.10/site-packages/werkzeug/routing/map.py", line 619, in match
    raise MethodNotAllowed(valid_methods=list(e.have_match_for)) from None
werkzeug.exceptions.MethodNotAllowed: 405 Method Not Allowed: The method is not allowed for the requested URL.
[repeated]

We run with the following CeleryConfig:

class CeleryConfig(object):
    broker_url = celery_redis_broker_url
    imports = (
        "superset.sql_lab",
        "superset.tasks.scheduler",
    )
    result_backend = celery_redis_result_url
    worker_prefetch_multiplier = 10
    task_acks_late = True
    task_annotations = {
        "sql_lab.get_sql_results": {
            "rate_limit": "100/s",
        },
    }
    beat_schedule = {
        [snip]
        "cache-warmup-hourly": {
            "task": "cache-warmup",
            "schedule": crontab(minute="*/30", hour="*"),
            "kwargs": {
                "strategy_name": "top_n_dashboards",
                "top_n": 10,
                "since": "7 days ago",
            },
        },
    }

Checklist

dosubot[bot] commented 2 months ago

Hey there, @rmasters! I'm a bot designed to help you out with bugs, questions, and becoming a contributor. I'm here to assist you with the issue you've encountered while waiting for a human maintainer. Let's tackle this together!

To fix the "405 Method Not Allowed" error in the chart cache warm-up task for Superset version 4.0.1, ensure that the HTTP method used for the /warm_up_cache endpoint is PUT. The endpoint is defined to accept only PUT requests.

Here is the relevant code snippet from superset/charts/api.py:

@expose("/warm_up_cache", methods=("PUT",))
@protect()
@safe
@statsd_metrics
@event_logger.log_this_with_context(
    action=lambda self, *args, **kwargs: f"{self.__class__.__name__}"
    f".warm_up_cache",
    log_to_statsd=False,
)
def warm_up_cache(self) -> Response:
    """Warm up the cache for the chart.
    ---
    put:
      summary: Warm up the cache for the chart
      description: >-
        Warms up the cache for the chart.
        Note for slices a force refresh occurs.
        In terms of the `extra_filters` these can be obtained from records in the JSON
        encoded `logs.json` column associated with the `explore_json` action.
      requestBody:
        description: >-
          Identifies the chart to warm up cache for, and any additional dashboard or
          filter context to use.
        required: true
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/ChartCacheWarmUpRequestSchema"
      responses:
        200:
          description: Each chart's warmup status
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/ChartCacheWarmUpResponseSchema"
        400:
          $ref: '#/components/responses/400'
        404:
          $ref: '#/components/responses/404'
        500:
          $ref: '#/components/responses/500'
    """
    try:
        body = ChartCacheWarmUpRequestSchema().load(request.json)
    except ValidationError as error:
        return self.response_400(message=error.messages)
    try:
        result = ChartWarmUpCacheCommand(
            body["chart_id"],
            body.get("dashboard_id"),
            body.get("extra_filters"),
        ).run()
        return self.response(200, result=[result])
    except CommandException as ex:
        return self.response(ex.status, message=ex.message)

Ensure that your request to warm up the cache uses the PUT method. For example, using curl:

curl -X PUT -H "Content-Type: application/json" -d '{"chart_id": 1, "dashboard_id": 2, "extra_filters": []}' http://your-superset-url/api/v1/chart/warm_up_cache

Or using Python's requests library:

import requests

url = "http://your-superset-url/api/v1/chart/warm_up_cache"
data = {
    "chart_id": 1,
    "dashboard_id": 2,
    "extra_filters": []
}
headers = {
    "Content-Type": "application/json"
}

response = requests.put(url, json=data, headers=headers)
print(response.status_code, response.json())

Additionally, ensure that the cache warm-up task is correctly configured in superset/tasks/cache.py. The fetch_url function should use the PUT method when making the request:

@celery_app.task(name="fetch_url")
def fetch_url(data: str, headers: dict[str, str]) -> dict[str, str]:
    """
    Celery job to fetch url
    """
    result = {}
    try:
        url = get_url_path("Superset.warm_up_cache")
        logger.info("Fetching %s with payload %s", url, data)
        req = request.Request(
            url, data=bytes(data, "utf-8"), headers=headers, method="PUT"
        )
        response = request.urlopen(  # pylint: disable=consider-using-with
            req, timeout=600
        )
        logger.info(
            "Fetched %s with payload %s, status code: %s", url, data, response.code
        )
        if response.code == 200:
            result = {"success": data, "response": response.read().decode("utf-8")}
        else:
            result = {"error": data, "status_code": response.code}
            logger.error(
                "Error fetching %s with payload %s, status code: %s",
                url,
                data,
                response.code,
            )
    except URLError as err:
        logger.exception("Error warming up cache!")
        result = {"error": data, "exception": str(err)}
    return result

Make sure the request method is PUT to avoid the "405 Method Not Allowed" error [1][2].

#### Details **About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

guga-kudos commented 4 weeks ago

We are also experiencing this problem here after upgrading from 2.x to 3.1.3. Will the changes be backported? 😃