apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
61.76k stars 13.5k forks source link

Error warming up cache: Error 302 #27160

Open Cerberus112 opened 6 months ago

Cerberus112 commented 6 months ago

Bug description

Cache warm-up is not functioning when configured using the latest version (3.1.1rc1) and the previous one (3.1.0) in kubernetes enviroment (with Helm chart version 0.2.15 or earlier).

When the task is triggered, logs of superset worker throws 308 error trying to request the API endpoint.

Note: Reports are working correctly on the same worker.

How to reproduce the bug

  1. Apply cache warm-up config in kubernetes enviroment
  2. Review the logs of superset worker

Screenshots/recordings

No response

Superset version

master / latest-dev

Python version

3.9

Node version

16

Browser

Chrome

Additional context

The values.yalm (cache warm-up configs):

  celery_conf: |
    from celery.schedules import crontab
    class CeleryConfig:
      broker_url = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0"
      imports = (
          "superset.sql_lab",
          "superset.tasks.cache",
          "superset.tasks.scheduler",
      )
      result_backend = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0"
      task_annotations = {
          "sql_lab.get_sql_results": {
              "rate_limit": "100/s",
          },
      }
      beat_schedule = {
          "reports.scheduler": {
              "task": "reports.scheduler",
              "schedule": crontab(minute="*", hour="*"),
          },
          "reports.prune_log": {
              "task": "reports.prune_log",
              'schedule': crontab(minute=0, hour=0),
          },
          'cache-warmup-hourly': {
              "task": "cache-warmup",
              "schedule": crontab(minute="*/2", hour="*"), ## for testing
              "kwargs": {
                  "strategy_name": "dummy"
              },
          }
      }
    CELERY_CONFIG = CeleryConfig
    THUMBNAIL_SELENIUM_USER = "admin"

Superset worker logs:

[2024-02-19 14:26:00,227: INFO/ForkPoolWorker-1] fetch_url[ecc6c59f-1a81-472c-bb3c-25daf1ccb203]: Fetching http://url.of.my.site/superset/warm_up_cache/ with payload {"chart_id": 43}
[2024-02-19` 14:22:00,263: ERROR/ForkPoolWorker-3] fetch_url[ecc6c59f-1a81-472c-bb3c-25daf1ccb203]: Error warming up cache!
Traceback (most recent call last):
  File "/app/superset/tasks/cache.py", line 242, in fetch_url
    response = request.urlopen(  # pylint: disable=consider-using-with
  File "/usr/local/lib/python3.9/urllib/request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/lib/python3.9/urllib/request.py", line 523, in open
    response = meth(req, response)
  File "/usr/local/lib/python3.9/urllib/request.py", line 632, in http_response
    response = self.parent.error(
  File "/usr/local/lib/python3.9/urllib/request.py", line 561, in error
    return self._call_chain(*args)
  File "/usr/local/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/usr/local/lib/python3.9/urllib/request.py", line 641, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 308: Permanent Redirect

Checklist

craig-rueda commented 6 months ago

Could be an issue with the worker's webdriver config or something. Unclear from the logs as to what the initial cause of the 308 is.

Cerberus112 commented 6 months ago

Could be an issue with the worker's webdriver config or something. Unclear from the logs as to what the initial cause of the 308 is.

I also considered this possibility, but the reports are being sent correctly, and they utilize the webdriver. For example:

[2024-02-21 09:32:00,077: INFO/ForkPoolWorker-1] Scheduling alert test_report eta: 2024-02-21 09:32:00
Executing alert/report, task id: 4a38bbbe-0f97-4031-afb1-6829668a754b, scheduled_dttm: 2024-02-21T09:32:00
[2024-02-21 09:32:00,082: INFO/ForkPoolWorker-1] Executing alert/report, task id: 4a38bbbe-0f97-4031-afb1-6829668a754b, scheduled_dttm: 2024-02-21T09:32:00
session is validated: id 9, executionid: 4a38bbbe-0f97-4031-afb1-6829668a754b
[2024-02-21 09:32:00,083: INFO/ForkPoolWorker-1] session is validated: id 9, executionid: 4a38bbbe-0f97-4031-afb1-6829668a754b
Running report schedule 4a38bbbe-0f97-4031-afb1-6829668a754b as user admin
[2024-02-21 09:32:00,116: INFO/ForkPoolWorker-1] Running report schedule 4a38bbbe-0f97-4031-afb1-6829668a754b as user admin
Report sent to email, notification content is {'notification_type': 'Report', 'notification_source': <ReportSourceFormat.DASHBOARD: 'dashboard'>, 'notification_format': 'PNG', 'chart_id': None, 'dashboard_id': 3, 'owners': [Superset Admin]}
[2024-02-21 09:32:14,119: INFO/ForkPoolWorker-1] Report sent to email, notification content is {'notification_type': 'Report', 'notification_source': <ReportSourceFormat.DASHBOARD: 'dashboard'>, 'notification_format': 'PNG', 'chart_id': None, 'dashboard_id': 3, 'owners': [Superset Admin]}

Anyway, I'll also include the configuration of the superset worker:

supersetWorker:
  affinity: {}
  autoscaling:
    enabled: false
    maxReplicas: 100
    minReplicas: 1
    targetCPUUtilizationPercentage: 80
  command:
  - /bin/sh
  - -c
  - |
    # Install chrome webdriver
    # See https://github.com/apache/superset/blob/4fa3b6c7185629b87c27fc2c0e5435d458f7b73d/docs/src/pages/docs/installation/email_reports.mdx
    apt-get update
    apt-get install wget unzip zip -y
    wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
    apt-get install -y --no-install-recommends ./google-chrome-stable_current_amd64.deb
    wget https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/121.0.6167.85/linux64/chromedriver-linux64.zip
    #unzip chromedriver_linux64.zip
    #chmod +x chromedriver
    #mv chromedriver /usr/bin
    unzip chromedriver-linux64.zip
    chmod +x chromedriver-linux64/chromedriver
    mv chromedriver-linux64/chromedriver /usr/bin
    apt-get autoremove -yqq --purge
    apt-get clean
    #rm -f google-chrome-stable_current_amd64.deb chromedriver-linux64.zip

    # Run
    . {{ .Values.configMountPath }}/superset_bootstrap.sh; celery --app=superset.tasks.celery_app:app worker
  containerSecurityContext: {}
  deploymentAnnotations: {}
  deploymentLabels: {}
  extraContainers: []
  forceReload: false
  initContainers:
  - command:
    - /bin/sh
    - -c
    - dockerize -wait "tcp://$DB_HOST:$DB_PORT" -wait "tcp://$REDIS_HOST:$REDIS_PORT"
      -timeout 120s
    envFrom:
    - secretRef:
        name: '{{ tpl .Values.envFromSecret . }}'
    image: '{{ .Values.initImage.repository }}:{{ .Values.initImage.tag }}'
    imagePullPolicy: '{{ .Values.initImage.pullPolicy }}'
    name: wait-for-postgres-redis
  livenessProbe:
    exec:
      command:
      - sh
      - -c
      - celery -A superset.tasks.celery_app:app inspect ping -d celery@$HOSTNAME
    failureThreshold: 3
    initialDelaySeconds: 120
    periodSeconds: 60
    successThreshold: 1
    timeoutSeconds: 60
  podAnnotations: {}
  podLabels: {}
  podSecurityContext: {}
  readinessProbe: {}
  replicaCount: 1
  resources: {}
  startupProbe: {}
  strategy: {}
  topologySpreadConstraints: []

and the worker's startup logs:

Saving to: ‘chromedriver-linux64.zip’
Archive:  chromedriver-linux64.zip
  inflating: chromedriver-linux64/LICENSE.chromedriver  
  inflating: chromedriver-linux64/chromedriver  
logging was configured successfully
2024-02-21 09:21:34,875:INFO:superset.utils.logging_configurator:logging was configured successfully
2024-02-21 09:21:34,878:INFO:root:Configured event logger of type <class 'superset.utils.log.DBEventLogger'>
/usr/local/lib/python3.9/site-packages/flask_limiter/extension.py:293: UserWarning: Using the in-memory storage for tracking rate limits as no storage was explicitly specified. This is not recommended for production use. See: https://flask-limiter.readthedocs.io#configuring-a-storage-backend for documentation about configuring the storage backend.
  warnings.warn(
/usr/local/lib/python3.9/site-packages/celery/platforms.py:840: SecurityWarning: You're running the worker with superuser privileges: this is
absolutely not recommended!

Please specify a different user using the --uid option.

User information: uid=0 euid=0 gid=0 egid=0

  warnings.warn(SecurityWarning(ROOT_DISCOURAGED.format(
Loaded your LOCAL configuration at [/app/pythonpath/superset_config.py]

 -------------- celery@superset-worker-7db568d57c-dht8w v5.2.2 (dawn-chorus)
--- ***** ----- 
-- ******* ---- Linux-3.10.0-1160.71.1.el7.x86_64-x86_64-with-glibc2.36 2024-02-21 09:21:36
- *** --- * --- 
- ** ---------- [config]
- ** ---------- .> app:         __main__:0x7f0fd54bbb20
- ** ---------- .> transport:   redis://superset-redis-headless:6379/0
- ** ---------- .> results:     redis://superset-redis-headless:6379/0
- *** --- * --- .> concurrency: 4 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** ----- 
 -------------- [queues]
                .> celery           exchange=celery(direct) key=celery
Cerberus112 commented 6 months ago

UPDATE:

Finally, I found that the redirection was due to the WEBDRIVER_BASEURL not being configured at the service level.

WEBDRIVER_BASEURL = "http://{{ template "superset.fullname" . }}:8088/"

However, I now encounter receiving 400 errors due to missing CSRF when trying to warm up the cache, both from the worker and externally using the API.

{'errors': [{'message': '400 Bad Request: The CSRF session token is missing.', 'error_type': 'GENERIC_BACKEND_ERROR', 'level': 'error', 'extra': {'issue_codes': [{'code': 1011, 'message': 'Issue 1011 - Superset encountered an unexpected error.'}]}}]}

If I disable CSRF:

WTF_CSRF_ENABLED = False

it returns 302:

[2024-02-29 17:00:00,368: INFO/ForkPoolWorker-2] fetch_url[356f2e18-4069-4f16-a8aa-d3bee8323296]: Fetching http://superset:8088/api/v1/chart/warm_up_cache with payload {"chart_id": 49}
[2024-02-29 17:00:00,377: ERROR/ForkPoolWorker-3] fetch_url[ba8a1804-8ea0-460a-8892-da8f8fa9b733]: Error warming up cache!
Traceback (most recent call last):
  File "/app/superset/tasks/cache.py", line 242, in fetch_url
    response = request.urlopen(  # pylint: disable=consider-using-with
  File "/usr/local/lib/python3.9/urllib/request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/lib/python3.9/urllib/request.py", line 523, in open
    response = meth(req, response)
  File "/usr/local/lib/python3.9/urllib/request.py", line 632, in http_response
    response = self.parent.error(
  File "/usr/local/lib/python3.9/urllib/request.py", line 555, in error
    result = self._call_chain(*args)
  File "/usr/local/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/usr/local/lib/python3.9/urllib/request.py", line 726, in http_error_302
    new = self.redirect_request(req, fp, code, msg, headers, newurl)
  File "/usr/local/lib/python3.9/urllib/request.py", line 664, in redirect_request
    raise HTTPError(req.full_url, code, msg, headers, fp)
urllib.error.HTTPError: HTTP Error 302: FOUND
[2024-02-29 17:00:00,378: ERROR/ForkPoolWorker-2] fetch_url[356f2e18-4069-4f16-a8aa-d3bee8323296]: Error warming up cache!
Traceback (most recent call last):
  File "/app/superset/tasks/cache.py", line 242, in fetch_url
    response = request.urlopen(  # pylint: disable=consider-using-with
  File "/usr/local/lib/python3.9/urllib/request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/lib/python3.9/urllib/request.py", line 523, in open
    response = meth(req, response)
  File "/usr/local/lib/python3.9/urllib/request.py", line 632, in http_response
    response = self.parent.error(
  File "/usr/local/lib/python3.9/urllib/request.py", line 555, in error
    result = self._call_chain(*args)
  File "/usr/local/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/usr/local/lib/python3.9/urllib/request.py", line 726, in http_error_302
    new = self.redirect_request(req, fp, code, msg, headers, newurl)
  File "/usr/local/lib/python3.9/urllib/request.py", line 664, in redirect_request
    raise HTTPError(req.full_url, code, msg, headers, fp)
urllib.error.HTTPError: HTTP Error 302: FOUND

Related errors reported before: https://github.com/apache/superset/issues/24717#issue-1808086329 https://github.com/apache/superset/issues/24579#issue-1786075530

dmuldoonadl commented 6 months ago

I'm experiencing the same issue.

rusackas commented 1 month ago

Just a head-up that while we're trying to get the linked PR merged, we're also no longer supporting 3.0.x, and will stop supporting 3.x.x when Superset 4.1 is released soon. If anyone can confirm this is indeed currently a 4.x.x issue, that'd be appreciated!

sanjaynayak007 commented 2 weeks ago

I am experiencing the cache warmup issue in Superset version 4.0.2.

[2024-08-28 10:30:01,206: INFO/ForkPoolWorker-2] fetch_url[d490c4c1-2b0a-4078-8f5a-7abc7f8f96ca]: Fetching http://superset:8088/superset/warm_up_cache/ with payload {"chart_id": 125, "dashboard_id": 22}
[2024-08-28 10:30:01,212: ERROR/ForkPoolWorker-2] fetch_url[d490c4c1-2b0a-4078-8f5a-7abc7f8f96ca]: Error warming up cache!
Traceback (most recent call last):
  File "/app/superset/tasks/cache.py", line 227, in fetch_url
    response = request.urlopen(  # pylint: disable=consider-using-with
  File "/usr/local/lib/python3.10/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/lib/python3.10/urllib/request.py", line 525, in open
    response = meth(req, response)
  File "/usr/local/lib/python3.10/urllib/request.py", line 634, in http_response
    response = self.parent.error(
  File "/usr/local/lib/python3.10/urllib/request.py", line 563, in error
    return self._call_chain(*args)
  File "/usr/local/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/usr/local/lib/python3.10/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 405: METHOD NOT ALLOWED
tamarinkeisari commented 5 days ago

I am experiencing the cache warmup issue in Superset version 4.0.2.

[2024-08-28 10:30:01,206: INFO/ForkPoolWorker-2] fetch_url[d490c4c1-2b0a-4078-8f5a-7abc7f8f96ca]: Fetching http://superset:8088/superset/warm_up_cache/ with payload {"chart_id": 125, "dashboard_id": 22}
[2024-08-28 10:30:01,212: ERROR/ForkPoolWorker-2] fetch_url[d490c4c1-2b0a-4078-8f5a-7abc7f8f96ca]: Error warming up cache!
Traceback (most recent call last):
  File "/app/superset/tasks/cache.py", line 227, in fetch_url
    response = request.urlopen(  # pylint: disable=consider-using-with
  File "/usr/local/lib/python3.10/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/lib/python3.10/urllib/request.py", line 525, in open
    response = meth(req, response)
  File "/usr/local/lib/python3.10/urllib/request.py", line 634, in http_response
    response = self.parent.error(
  File "/usr/local/lib/python3.10/urllib/request.py", line 563, in error
    return self._call_chain(*args)
  File "/usr/local/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/usr/local/lib/python3.10/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 405: METHOD NOT ALLOWED

I am facing the same problem, also version 4.0.2 Have you found a solution?

I have tried to add a code someone said it will solve it and now it shows me the same error but code 400 thank you so much!