Open zachliu opened 3 weeks ago
I replaced len(data)
with
def _get_size_iterative(dict_obj):
"""Iteratively finds size of objects in bytes"""
seen = set()
size = 0
objects = deque([dict_obj])
while objects:
current = objects.popleft()
if id(current) in seen:
continue
seen.add(id(current))
size += sys.getsizeof(current)
if isinstance(current, dict):
objects.extend(current.keys())
objects.extend(current.values())
elif hasattr(current, '__dict__'):
objects.append(current.__dict__)
elif hasattr(current, '__iter__') and not isinstance(current, (str, bytes, bytearray)):
objects.extend(current)
return size
It works fine. The in-memory dictionary size is usually a lot larger than in-disk storage size such as a csv file due to Python's in-memory storage overheads but at least it gives us a relative value especially informative because I'm using data_length
in a DataDog dashboard to monitor user's query result sizes
Issue Summary
Before this PR https://github.com/getredash/redash/pull/6687, the data returned by query runners are json strings. Hence the
data_length
calculated bylen(data)
makes sense:https://github.com/getredash/redash/blob/60a12e906efb8f7948fdbe5e013249b8b0c0089a/redash/tasks/queries/execution.py#L194-L200
But after https://github.com/getredash/redash/pull/6687,
data
is a nested dictionary. Andlen(data)
only gives the number of keys it has. In most cases, there are only two keys, "columns" and "rows", so thedata_length
doesn't really give us useful information.Steps to Reproduce
Search for
data_length=
in your logs.Technical details: