Open dklei opened 3 years ago
Hey, I'm hitting this issue. Is it possible to fix this soon? This is quite a severe issue as it fails silently.
@gianm @mistercrunch
This is still an issue in v0.6.6:
>>> from importlib.metadata import version
>>> version('pydruid')
'0.6.6'
To replicate:
from pydruid.db.api import rows_from_chunks
bad_json = """[
{
"id": 1,
"value": "hi"
},
{
"id": 2,
"value": "C:\\\\"
},
{
"id": 3,
"value": "this row is missing..."
}
]"""
for row in rows_from_chunks([bad_json]):
print(f"row from bad json: {row}")
print("that's all!")
This prints:
row from bad json: OrderedDict([('id', 1), ('value', 'hi')])
that's all!
There are rows missing!
The suggested change in #262 seems to fix this problem. If I paste in the updated function definition from that PR and then rerun the above script, it prints the expected result:
row from bad json: OrderedDict([('id', 1), ('value', 'hi')])
row from bad json: OrderedDict([('id', 2), ('value', 'C:\\')])
row from bad json: OrderedDict([('id', 3), ('value', 'this row is missing...')])
that's all!
Hi,
I'm using
pydruid.db.connector
to run a query that pulls a row where the content that is returned ends in"...\\"
, and this appears to break pydruid, meaning it either drops rows from the data or fails with aJSONDecodeError
.e.g.
"SELECT x FROM y"
->[{"x": "some row"},{"x": "...\\"},{"x": "another row"},{"x": "more rows"}]
2020-11-27 10:44:23: [CRITICAL] JSONDecodeError('Unterminated string starting at: line 1 column 85919 (char 85918)') 2020-11-27 10:44:23: [CRITICAL] Traceback (most recent call last): File "xxxxx", line 291, in main data_paths = pull_data(tracker.last_data_dt, tracker.next_data_dt) File "xxxxx", line 162, in pull_data data_path = collector.execute_and_save() File "xxxxx", line 226, in execute_and_save for i, row in enumerate(cursor): File "xxxxx", line 181, in _get_cursor raise err File "xxxxx", line 164, in _get_cursor raise err File "xxxxx", line 161, in _get_cursor r = next(cursor) File "/xxxx/venv/lib64/python3.8/site-packages/pydruid/db/api.py", line 62, in g return f(self, *args, kwargs) File "/xxxx/venv/lib64/python3.8/site-packages/pydruid/db/api.py", line 320, in next return next(self._results) File "/xxxx/venv/lib64/python3.8/site-packages/pydruid/db/api.py", line 370, in _stream_query for row in rows_from_chunks(chunks): File "/xxxx/venv/lib64/python3.8/site-packages/pydruid/db/api.py", line 420, in rows_from_chunks for row in json.loads( File "/usr/lib64/python3.8/json/init.py", line 370, in loads return cls(kw).decode(s) File "/usr/lib64/python3.8/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib64/python3.8/json/decoder.py", line 353, in raw_decode obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 85919 (char 85918)
Any rows proceeding the
{"x": "...\\"}
either do not return data, or return aJSONDecodeError
. I'm guessing this is becausepydruid.db.api.rows_from_chunks
tries to parse the JSON itself, and looks for"\\"
as end of strings?I have attached a script and a dummy JSON file (scratch.zip) that shows the rows being dropped by the function but this does not trigger the
JSONDecodeError
- this appears to only trigger when I try to read this row and the surrounding rows from the database.Many thanks in advance