Closed Pinimo closed 2 years ago
BTW, I noticed there is such a login procedure (with a headless browser) in the email report generator. Perhaps that procedure could be factored out and reused to warm up the cache in our case?
@Pinimo what does your cache config look like?
I setup redis caching with with a timeout of 5 minutes and cache warmup with the topndashboard
strategy every 2 minutes (just to test).
I can see this in Celery worker logs:
[2020-04-21 00:36:00,009: INFO/ForkPoolWorker-1] cache-warmup[e41de539-0bf7-4e70-b02b-4d2a132d8d0e]: Loading strategy
[2020-04-21 00:36:00,010: INFO/ForkPoolWorker-1] cache-warmup[e41de539-0bf7-4e70-b02b-4d2a132d8d0e]: Loading TopNDashboardsStrategy
[2020-04-21 00:36:00,014: INFO/ForkPoolWorker-1] cache-warmup[e41de539-0bf7-4e70-b02b-4d2a132d8d0e]: Success!
[2020-04-21 00:36:00,043: INFO/ForkPoolWorker-1] cache-warmup[e41de539-0bf7-4e70-b02b-4d2a132d8d0e]: Fetching http://0.0.0.0:8088/superset/explore/?form_data=%7B%22slice_id%22%3A%201%7D
[2020-04-21 01:06:00,131: INFO/ForkPoolWorker-2] cache-warmup[d2d68627-adce-4fa5-852e-522e95350a6c]: {'success': ['http://0.0.0.0:8088/superset/explore/?form_data=%7B%22slice_id%22%3A%201%7D'], 'errors': []}
but in Superset logs, I only see:
superset_1 | 2020-04-21 00:36:00,049 [DEBUG] [stats_logger] (incr) explore
Needless to say, my charts are not being updated
This is my config:
CACHE_DEFAULT_TIMEOUT = 300
CACHE_CONFIG = {
'CACHE_TYPE': 'redis',
'CACHE_DEFAULT_TIMEOUT': 180,
'CACHE_KEY_PREFIX': 'superset_results',
'CACHE_REDIS_URL': 'redis://localhost:6379/0',
}
class CeleryConfig(object):
BROKER_URL = 'redis://localhost:6379/0'
CELERY_IMPORTS = (
'superset.sql_lab',
'superset.tasks',
)
CELERY_RESULT_BACKEND = 'redis://localhost:6379/0'
CELERYD_LOG_LEVEL = 'DEBUG'
CELERYD_PREFETCH_MULTIPLIER = 10
CELERY_ACKS_LATE = True
CELERYBEAT_SCHEDULE = {
'cache-warmup-hourly': {
'task': 'cache-warmup',
'schedule': crontab(minute='*/2', hour='*'),
'kwargs': {
'strategy_name': 'top_n_dashboards',
'top_n': 5,
'since': '7 days ago',
},
},
}
CELERY_CONFIG = CeleryConfig
@jayhjha Perhaps it would be worth changing your config variable SUPERSET_SERVER_ADDRESS
to "superset"
.
Any news on this?
A colleague made a POC on this, but came to the conclusion it is already quite difficult to have the email reports working... He wanted to use part of that code (headless browser + login) to work around the login problem. I think he found out the dependencies for the feature were not included in the Dockerfile.
To my knowledge the feature is (and will stay...) broken :cry:
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned
to prevent stale bot from closing the issue.
Any news from the community on this issue?
@Pinimo any workaround you find to this issue? I am getting the same exact issue.
@mukulsaini I have not found the time to address the issue, to my best knowledge it has not been solved yet. If you too find this issue is a real problem, I invite you to talk it over on Superset's Slack :left_speech_bubble:
Here are a few educated guesses as how to solve the issue:
We could work around the auth by... signing in through a headless browser in the Celery process. After thinking it over, it seems difficult to me:
Perhaps a better solution would involve setting up an API server with its own authentication procedure -- or a new auth method on the same server, to allow the Celery worker to perform cache requests:
POST
requests that only return empty documents (not usable to extract data from the instance).Not sure at all about this last idea: we could code a new CLI route to return the chart data. The Celery worker would then execute the CLI (if I'm remembering right, all the configs and docker images are the same). However, that would possibly infringe memory limits for the Celery worker.
Yet another draft solution:
db-init
s, with a very specific caching role. I find it important that this role should never be able to actually extract any data (so as not to care too much for the password being stolen), just to ping the server and get it to cache the data. It would even be possible to modify the @login_required
decorator to add the constraint:
__cache_worker
, then
Mmmh, maybe I'm missing something, but it seems like we shouldn't have to go through the web server to do this.
Refactoring / mimicking what explore_json
does might be an option.
https://github.com/apache/incubator-superset/blob/master/superset/views/core.py#L525-L536
@mistercrunch What I see from the previous commits, previously the route used to cache warmup in cache.py
get_url was /explore_json
. Any reason it was changed to /explore
?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned
to prevent stale bot from closing the issue.
Any news or workarounds for avoiding the 302 to the login endpoint?
I still run into this issue using latest docker image (warm-up succeeds on worker, superset logs show redirect to login, no caches refreshed). Not being able to warm-up caches periodically feels like a missing vital feature.
Cache Warmup is not working on my superset1.2 (docker, ubuntu 18.04), my superset log is below,
Is this known defect in current version, Any suggestion on this or workaround?
superset_worker | [2021-07-17 16:20:00,031: INFO/ForkPoolWorker-1] cache-warmup[e9f826a6-794a-46e5-b6b4-3e2122c7ae03]: Loading strategy superset_worker | [2021-07-17 16:20:00,031: INFO/ForkPoolWorker-1] cache-warmup[e9f826a6-794a-46e5-b6b4-3e2122c7ae03]: Loading TopNDashboardsStrategy superset_worker | [2021-07-17 16:20:00,032: INFO/ForkPoolWorker-1] cache-warmup[e9f826a6-794a-46e5-b6b4-3e2122c7ae03]: Success! superset_worker | [2021-07-17 16:20:00,049: INFO/ForkPoolWorker-1] cache-warmup[e9f826a6-794a-46e5-b6b4-3e2122c7ae03]: Fetching http://0.0.0.0:8088/superset/explore/?form_data=%7B%22slice_id%22%3A%20164%7D superset_worker | [2021-07-17 16:20:00,051: ERROR/ForkPoolWorker-1] cache-warmup[e9f826a6-794a-46e5-b6b4-3e2122c7ae03]: Error warming up cache! superset_worker | Traceback (most recent call last): superset_worker | File "/usr/local/lib/python3.7/urllib/request.py", line 1350, in do_open superset_worker | encode_chunked=req.has_header('Transfer-encoding')) superset_worker | File "/usr/local/lib/python3.7/http/client.py", line 1277, in request superset_worker | self._send_request(method, url, body, headers, encode_chunked) superset_worker | File "/usr/local/lib/python3.7/http/client.py", line 1323, in _send_request superset_worker | self.endheaders(body, encode_chunked=encode_chunked) superset_worker | File "/usr/local/lib/python3.7/http/client.py", line 1272, in endheaders superset_worker | self._send_output(message_body, encode_chunked=encode_chunked) superset_worker | File "/usr/local/lib/python3.7/http/client.py", line 1032, in _send_output superset_worker | self.send(msg) superset_worker | File "/usr/local/lib/python3.7/http/client.py", line 972, in send superset_worker | self.connect() superset_worker | File "/usr/local/lib/python3.7/http/client.py", line 944, in connect superset_worker | (self.host,self.port), self.timeout, self.source_address) superset_worker | File "/usr/local/lib/python3.7/socket.py", line 728, in create_connection superset_worker | raise err superset_worker | File "/usr/local/lib/python3.7/socket.py", line 716, in create_connection superset_worker | sock.connect(sa) superset_worker | ConnectionRefusedError: [Errno 111] Connection refused superset_worker | superset_worker | During handling of the above exception, another exception occurred: superset_worker | superset_worker | Traceback (most recent call last): superset_worker | File "/app/superset/tasks/cache.py", line 291, in cache_warmup superset_worker | request.urlopen(url) superset_worker | File "/usr/local/lib/python3.7/urllib/request.py", line 222, in urlopen superset_worker | return opener.open(url, data, timeout) superset_worker | File "/usr/local/lib/python3.7/urllib/request.py", line 525, in open superset_worker | response = self._open(req, data) superset_worker | File "/usr/local/lib/python3.7/urllib/request.py", line 543, in _open superset_worker | '_open', req) superset_worker | File "/usr/local/lib/python3.7/urllib/request.py", line 503, in _call_chain superset_worker | result = func(*args) superset_worker | File "/usr/local/lib/python3.7/urllib/request.py", line 1378, in http_open superset_worker | return self.do_open(http.client.HTTPConnection, req) superset_worker | File "/usr/local/lib/python3.7/urllib/request.py", line 1352, in do_open superset_worker | raise URLError(err) superset_worker | urllib.error.URLError: <urlopen error [Errno 111] Connection refused> superset_worker | [2021-07-17 16:20:00,054: INFO/ForkPoolWorker-2] Task reports.scheduler[8a25cd68-3d17-4969-bb69-752aee1fb177] succeeded in 0.02081612199981464s: None superset_worker | [2021-07-17 16:20:00,055: INFO/ForkPoolWorker-1] Task cache-warmup[e9f826a6-794a-46e5-b6b4-3e2122c7ae03] succeeded in 0.024644025999805308s: {'success': [], 'errors': ['http://0.0.0.0:8088/superset/explore/?form_data=%7B%22slice_id%22%3A%20164%7D']}
Yes, unfortunately the cache warmup code is missing auth. It should be fairly simple to pass auth cookies in the request though, most of the code is already in superset. There's a function that returns auth cookies given a user (can use the THUMBNAIL_SELENIUM_USER
config, as that user should be an admin user that can view all charts), it lives here. So you would simply have to add the code that grabs the cookies and then in each chart request include the Cookies
header.
Additionally, it seems that the celery worker is reporting success since 302s do not throw URLError
exceptions. The code should probably be updated to only report success if the response is in the 200 range.
superset 1.3.2 (ecs not docker env) python 3.7.9
I tried the warmup feature last week. And find out several problems:
And I read the cache part code in master branch and nothing change.
Still this issue was not fixed even with the latest image Some one please throw the light.
I'm quite confused how this is advertised in the documentation as a way to warm the cache, yet is impossible since all the cache-warmup strategies are stuck behind the auth wall... is the community overlooking something?
Tagging original author @betodealmeida and recent maintainers @graceguo-supercat & @john-bodley
At Lyft, when I worked on this, we had a custom security manager, and we could access any endpoint using a master token that gave full permissions.
@betodealmeida i started to explore slapping on a token, but noticed everything was cookie based so i didn't pursue much further.
I managed to just grab the Set-Cookie
by making a login request and then attaching the cookie to the cache warming requests https://github.com/apache/superset/commit/f57fab7f8c22e23293ac049814a6c5bb34df9c2a
I just need my graphs precomputed, saw it documented and thought this would be a good fit. Not interested in forking this, maintaining it, & building images from underlying source etc. I see you work at preset.io which is based on superset; does your on-prem offering happen to have this working?
Maybe i can do something with https://github.com/apache/superset/blob/b08e21efd906d13994414b39bfa7f6e98466d4cb/superset/security/api.py#L113-L162
But looks like it would need to be regenerated whenever a new dashboard is added https://github.com/apache/superset/blob/b08e21efd906d13994414b39bfa7f6e98466d4cb/superset/security/manager.py#L1334-L1349
Ah, you're right, I don't think there's an easy way to customize the celery workers to pass a custom token in the request. Let me take another look at this, it's been a couple years.
@ajwhite @betodealmeida I sent a PR to address this issue, which is working in my environment.
Globally: the cache warm-up tasks launched by Celery workers all silently fail. Indeed, they perform
GET
s on the main server's URL without providing the required authentication. However, dashboards may not be loaded without being logged in.Related bugs:
--beat
flag to listen on CeleryBeat schedules (cfdocker-compose.yml
configuration)At stake: long dashboard load times for our users, or outdated dashboards.
Main files to be fixed:
superset/tasks/cache.py
Expected results
When the Celery worker logs this (notice
'errors': []
):... we would expect to have something (more or less) like this in the Superset server logs:
Of course, we also hope to have a bunch of items in the Redis logs, and that loading dashboards is lightning-quick.
Actual results
But we get these logs instead, which show there is a 302 redirect to the login page, followed by a 200 on the login page. This redirect is interpreted as a success by the tests.
(I added a few line returns)
In the Redis, here is the only stored key:
Last, the dashboards take time loading the data on the first connection.
Screenshots
None
How to reproduce the bug
I had to patch the master branch to get this to work. In particular, I have to admit it was not very clear to me whether the config was read from file
docker/pythonpath_dev/superset_config.py
or filesuperset/config.py
. So I kind of adaptedsuperset/config.py
and copied it over to thepythonpath
one (which looks like it is read by the celery worker, but not the server).Anyway, this reproduces the bug:
$ docker system prune --all
to remove all dangling images, exited containers and volumes.$ git checkout master && git pull origin master
$ wget -O configs.patch https://gist.githubusercontent.com/Pinimo/c339ea828974d2141423b6ae64192aa4/raw/e449c97c11f81f7270d6e0b2369d55ec41b079a9/0001-bug-Patch-master-to-reproduce-sweetly-the-cache-warm.patch && git apply configs.patch
This will apply patches to master to make the scenario work out neatly, in particular add the
--beat
flag and specify a cache warmup task on all dashboards every minute.$ docker-compose up -d
$ docker-compose logs superset-worker | grep cache-warmup
$ docker-compose logs superset | grep slice
$ docker-compose exec redis redis-cli
then typeKEYS *
Environment
(please complete the following information):
Checklist