ICRAR / ijson

Iterative JSON parser with Pythonic interfaces
http://pypi.python.org/pypi/ijson/
Other
837 stars 51 forks source link

Event interception not available on async functions #48

Open fcavalieri opened 3 years ago

fcavalieri commented 3 years ago

I am trying to implement the Intercepting Events pattern from https://github.com/ICRAR/ijson#id13 to consume an aiohttp response. When using non-async sources everything works as expected.

Running Python 3.9.4 (on Kubuntu 20.04), ijson 3.1.4, aiohttp 3.7.4.post0. For the sake of testing all backends i also installed cffi 1.14.5, and the OS package libyajl2:amd64 2.1.0-3. The precise versions do not seem crucial. The code below uses a json file from the web, the specific json data is not important. The path specified in the code is not important either.

import asyncio
import traceback

import aiohttp
import ijson

url ='https://support.oneskyapp.com/hc/en-us/article_attachments/202761727/example_2.json'

async def run():
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            parse_events = ijson.parse_async(response.content)  # 1 <----
            async for prefix, event, value in parse_events:
                print(prefix, event, value)
        async with session.get(url) as response:
            async for i in ijson.items_async(response.content, "quiz.maths.q2.options.item"):  # 2 <----
                print(i)
        async with session.get(url) as response:
            body = await response.read()
            parse_events = ijson.parse(body)
            for i in ijson.items(parse_events, "quiz.maths.q2.options.item"):  # 3 <----
                print(i)
        for backend in ['yajl2_c', 'yajl2_cffi', 'yajl2', 'python']:
            try:
                ijson_backend = ijson.get_backend(backend)
                async with session.get(url) as response:
                    parse_events = ijson_backend.parse_async(response.content)
                    async for i in ijson_backend.items_async(parse_events, "quiz.maths.q2.options.item"):   # 4 <----
                        print(i)
            except Exception as e:
                print(f"{backend}\n\n\n{traceback.format_exc()}\n\n\n")

if __name__ == '__main__':
    asyncio.get_event_loop().run_until_complete(run())

1, 2, and 3 work fine. 4 raises various exceptions depending on the backend:

yajl2_c

Traceback (most recent call last):
  File "/home/federico/python-tests/test.py", line 23, in run
    async for i in ijson_backend.items_async(parse_events, "quiz.maths.q2.options.item"):
  File "/home/federico/python-tests/.venv/lib/python3.9/site-packages/ijson/utils35.py", line 20, in _get_read
    if type(await f.read(0)) == compat.bytetype:
AttributeError: '_yajl2._parse_async' object has no attribute 'read'

yajl2_cffi

Traceback (most recent call last):
  File "/home/federico/python-tests/test.py", line 23, in run
    async for i in ijson_backend.items_async(parse_events, "quiz.maths.q2.options.item"):
  File "/home/federico/python-tests/.venv/lib/python3.9/site-packages/ijson/utils35.py", line 48, in __anext__
    self.read = await _get_read(self.f)
  File "/home/federico/python-tests/.venv/lib/python3.9/site-packages/ijson/utils35.py", line 20, in _get_read
    if type(await f.read(0)) == compat.bytetype:
TypeError: 'NoneType' object is not callable

Exception ignored in: <generator object basic_parse_basecoro at 0x7fb0f35bfdd0>
Traceback (most recent call last):
  File "/home/federico/python-tests/.venv/lib/python3.9/site-packages/ijson/backends/yajl2_cffi.py", line 225, in basic_parse_basecoro
    yajl_parse(handle, buffer)
  File "/home/federico/python-tests/.venv/lib/python3.9/site-packages/ijson/backends/yajl2_cffi.py", line 196, in yajl_parse
    raise exception(error)
ijson.common.IncompleteJSONError: parse error: premature EOF

                     (right here) ------^

yajl2

Traceback (most recent call last):
  File "/home/federico/python-tests/test.py", line 23, in run
    async for i in ijson_backend.items_async(parse_events, "quiz.maths.q2.options.item"):
  File "/home/federico/python-tests/.venv/lib/python3.9/site-packages/ijson/utils35.py", line 48, in __anext__
    self.read = await _get_read(self.f)
  File "/home/federico/python-tests/.venv/lib/python3.9/site-packages/ijson/utils35.py", line 20, in _get_read
    if type(await f.read(0)) == compat.bytetype:
TypeError: 'NoneType' object is not callable

Exception ignored in: <generator object basic_parse_basecoro at 0x7fb0f35bfcf0>
Traceback (most recent call last):
  File "/home/federico/python-tests/.venv/lib/python3.9/site-packages/ijson/backends/yajl2_cffi.py", line 225, in basic_parse_basecoro
    yajl_parse(handle, buffer)
  File "/home/federico/python-tests/.venv/lib/python3.9/site-packages/ijson/backends/yajl2_cffi.py", line 196, in yajl_parse
    raise exception(error)
ijson.common.IncompleteJSONError: parse error: premature EOF

                     (right here) ------^

Exception ignored in: <generator object basic_parse_basecoro at 0x7fb0f35bf970>
Traceback (most recent call last):
  File "/home/federico/python-tests/.venv/lib/python3.9/site-packages/ijson/backends/yajl2.py", line 50, in basic_parse_basecoro
    raise exception(error)
ijson.common.IncompleteJSONError: parse error: premature EOF

                     (right here) ------^

python

Traceback (most recent call last):
  File "/home/federico/python-tests/test.py", line 23, in run
    async for i in ijson_backend.items_async(parse_events, "quiz.maths.q2.options.item"):
  File "/home/federico/python-tests/.venv/lib/python3.9/site-packages/ijson/utils35.py", line 48, in __anext__
    self.read = await _get_read(self.f)
  File "/home/federico/python-tests/.venv/lib/python3.9/site-packages/ijson/utils35.py", line 20, in _get_read
    if type(await f.read(0)) == compat.bytetype:
TypeError: 'NoneType' object is not callable

Exception ignored in: <generator object basic_parse_basecoro at 0x7fb0f35bfe40>
Traceback (most recent call last):
  File "/home/federico/python-tests/.venv/lib/python3.9/site-packages/ijson/backends/yajl2.py", line 50, in basic_parse_basecoro
    raise exception(error)
ijson.common.IncompleteJSONError: parse error: premature EOF

                     (right here) ------^

Exception ignored in: <generator object utf8_encoder at 0x7fb0f3587200>
Traceback (most recent call last):
  File "/home/federico/python-tests/.venv/lib/python3.9/site-packages/ijson/backends/python.py", line 46, in utf8_encoder
    target.close()
  File "/home/federico/python-tests/.venv/lib/python3.9/site-packages/ijson/backends/python.py", line 116, in Lexer
    target.send(EOF)
  File "/home/federico/python-tests/.venv/lib/python3.9/site-packages/ijson/backends/python.py", line 161, in parse_value
    raise common.IncompleteJSONError('Incomplete JSON content')
ijson.common.IncompleteJSONError: Incomplete JSON content
Exception ignored in: <generator object utf8_encoder at 0x7fb0f35870b0>
Traceback (most recent call last):
  File "/home/federico/python-tests/.venv/lib/python3.9/site-packages/ijson/backends/python.py", line 46, in utf8_encoder
    target.close()
  File "/home/federico/python-tests/.venv/lib/python3.9/site-packages/ijson/backends/python.py", line 116, in Lexer
    target.send(EOF)
  File "/home/federico/python-tests/.venv/lib/python3.9/site-packages/ijson/backends/python.py", line 161, in parse_value
    raise common.IncompleteJSONError('Incomplete JSON content')
ijson.common.IncompleteJSONError: Incomplete JSON content

I tried a few combinations of parse_async/parse/items_async/items/async for/for, but without luck.

Am i doing something wrong or is there an issue?

rtobar commented 3 years ago

@fcavalieri thanks for noting this.

To answer your final question: you are doing nothing wrong, and yet at the same time there is no actual issue. The problem is one of wrong expectations: the event interception mechanism works only for the generator (i.e., sync) functions. This is not really stated in the documentation though, and that's something I can fix right now.

In the future it should be possible to implement event interception for the async functions I think, but that would require some more work. I'll change the title of this issue to reflect the lack of this feature, but other than that I can't promise this is something I'll be doing any time soon.