apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
62.2k stars 13.65k forks source link

error 37 on big query results #681

Closed LAlbertalli closed 5 years ago

LAlbertalli commented 8 years ago

Hi,

I found this issue using Caravel with queries that return large datasets. Having Memcache enabled, I receive an error that says: "error 37 from memcached_set: SUCCESS".

This is the full stack trace

[Mon Jun 27 16:51:39.793897 2016] [:error] [pid 31331] SELECT user_id AS user_id, COUNT(*) AS count 
[Mon Jun 27 16:51:39.793938 2016] [:error] [pid 31331] FROM walkin 
[Mon Jun 27 16:51:39.793946 2016] [:error] [pid 31331] WHERE memsql_insert_time >= '2016-06-20 16:51:39.000000' AND memsql_insert_time <= '2016-06-27 16:51:39.000000' GROUP BY user_id ORDER BY count DESC 
[Mon Jun 27 16:51:39.793955 2016] [:error] [pid 31331]  LIMIT 50000
[Mon Jun 27 16:51:41.668233 2016] [:error] [pid 31331] 2016-06-27 16:51:41,668:INFO:root:Caching for the next 1200 seconds
[Mon Jun 27 16:51:42.262339 2016] [:error] [pid 31331] 2016-06-27 16:51:42,262:ERROR:root:error 37 from memcached_set: SUCCESS
[Mon Jun 27 16:51:42.262368 2016] [:error] [pid 31331] Traceback (most recent call last):
[Mon Jun 27 16:51:42.262383 2016] [:error] [pid 31331]   File "/usr/lib/python2.7/site-packages/caravel-0.9.0-py2.7.egg/caravel/views.py", line 675, in explore
[Mon Jun 27 16:51:42.262385 2016] [:error] [pid 31331]     payload = obj.get_json()
[Mon Jun 27 16:51:42.262388 2016] [:error] [pid 31331]   File "/usr/lib/python2.7/site-packages/caravel-0.9.0-py2.7.egg/caravel/viz.py", line 269, in get_json
[Mon Jun 27 16:51:42.262390 2016] [:error] [pid 31331]     cache.set(cache_key, payload, timeout=cache_timeout)
[Mon Jun 27 16:51:42.262392 2016] [:error] [pid 31331]   File "/usr/lib/python2.7/site-packages/flask_cache/__init__.py", line 200, in set
[Mon Jun 27 16:51:42.262395 2016] [:error] [pid 31331]     self.cache.set(*args, **kwargs)
[Mon Jun 27 16:51:42.262397 2016] [:error] [pid 31331]   File "/usr/lib/python2.7/site-packages/werkzeug/contrib/cache.py", line 436, in set
[Mon Jun 27 16:51:42.262400 2016] [:error] [pid 31331]     return self._client.set(key, value, timeout)
[Mon Jun 27 16:51:42.262402 2016] [:error] [pid 31331] Error: error 37 from memcached_set: SUCCESS

The reason is probably the limit of 1Mb in Memcache. I'm configuring it to increase the limit, but there's always the risk to hit the limit again. Caravel should be able to split the entry in multiple ones to avoid hitting the limit.

Thanks L.

xrmx commented 8 years ago

That sounds painful to handle and ideally something that should be pushed down to flask-cache memcached backend. Unfortunately it doesn't look like flask-cache support compression, that could have been handy in this case.

LAlbertalli commented 8 years ago

I agree even though Caravel is a partially different use-case than what Flask is used for.

Having worked a little bit with the queries that were causing the error, I noticed that they are not really good queries anyway. But it is annoying to not being able to see the result. What could make sense is to swallow the exception but provide, inside the slice view, a warning to the user that informs him the query could not be cached. Could make sense?

L.

mistercrunch commented 8 years ago

The good news is I do compress the payload before caching it, meaning that you're probably squeezing 5-10 times more than 1mb worth of raw data: https://github.com/airbnb/caravel/blob/master/caravel/viz.py#L306

I had the intuition that memcache would allow to increase the somewhat arbitrary 1mb limit, but it looked a bit more tricky than it should have been, though it seemed possible to do it.

I thought the way I wrote the code it would silence the error (but log it) and serve the result set anyhow. Is that not what you are seeing?

LAlbertalli commented 8 years ago

Configuration of memcached is pretty painful, at least on CentOS 7, but the worst problem is actually figuring out the reason.

Regarding the result on the FE, the app doesn't crash, but I get just an alert reporting the error (red bar). I haven't checked if the json is returned correctly.