getredash / redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
http://redash.io/
BSD 2-Clause "Simplified" License
26.5k stars 4.39k forks source link

Redash queries crash if numerical column contain `NaN` #6992

Open noxdafox opened 6 months ago

noxdafox commented 6 months ago

Issue Summary

The following commit https://github.com/getredash/redash/commit/4d5103978bdc20900f205532b48356ec80339632, introduced a regression. As simplejson was replaced with json strict JSON validation is now enforced.

The belief is that JSON validation was enforced to prevent the json module to return NaN values which break the frontend: https://github.com/getredash/redash/blob/master/redash/utils/__init__.py#L129

    # Using allow_nan = True will make Python render nan as NaN, leading to parse error in front-end

Yet, the change and the comments suggest that this was not the expected behaviour: https://github.com/getredash/redash/blob/master/redash/utils/__init__.py#L128

    # Float value nan or inf in Python should be render to None or null in json.

Previous to that commit, NaN values were allowed and automatically converted to None by simplejson. json does not support this functionality and now the behaviour is to raise ValueError effectively crashing the query every time a NaN value is encountered.

[2024-05-29 08:20:44,474][PID:966][ERROR][rq.worker] [Job 1570d8da-97d2-4ab6-8395-3d6ff4fdda8a]: exception raised while executing (redash.tasks.queries.execution.execute_query)
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1204, in _execute_context
    context = constructor(dialect, self, conn, *args)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 873, in _init_compiled
    param = dict(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 876, in <genexpr>
    processors[key](compiled_params[key])
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/sql/type_api.py", line 1232, in process
    return process_param(value, dialect)
  File "/app/redash/models/types.py", line 29, in process_bind_param
    return json_dumps(value)
  File "/app/redash/utils/__init__.py", line 131, in json_dumps
    return json.dumps(data, *args, **kwargs)
  File "/usr/local/lib/python3.8/json/__init__.py", line 234, in dumps
    return cls(
  File "/usr/local/lib/python3.8/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/local/lib/python3.8/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
ValueError: Out of range float values are not JSON compliant

This is breaking several of our queries as null values are acceptable in many DB implementations in place of floats/doubles/reals.

To revert back to the previous behaviour, either simplejson is reinstated as JSON serialization library, or a custom JSONEncoder class is provided (and maintained) instead: https://stackoverflow.com/questions/28639953/python-json-encoder-convert-nans-to-null-instead

Technical details:

EugeneChung commented 4 months ago

This problem is originated by the PR https://github.com/getredash/redash/pull/6685 . I think it's made huge interoperability problem.

EugeneChung commented 4 months ago

I suggest to use orjson. It allows nan, too. In general, it's faster than json and simplejson.