barneygale / httpio

Python library for reading HTTP resources as random-access file-like objects using the `Range` header
Other
31 stars 5 forks source link

Asyncio Interface #5

Closed jamesba closed 5 years ago

jamesba commented 5 years ago

This adds an additional optional asyncio compatible interface to the library for use only in versions of python above 3.5. In doing so it replicates the existing file-like interface for the most part, but many of the methods become coroutines instead, and for any method that returns a list the asynchronous equivalent is an asynchronous generator.

jamesba commented 5 years ago

@barneygale What do you think of this? There doesn't seem to be an existing standard for how to do seekable file-like io with asyncio yet so I've had to improvise somewhat.

barneygale commented 5 years ago

I've tried to get my head around python's asyncio a couple of times and found it completely baffling, so I'm not much use for a code review. Let me know if/when you're comfortable for this to be merged and I'll merge it.

jamesba commented 5 years ago

Honestly it's not well documented at all, and has been something of a moving target. Hopefully it's a bit more stable now, and better documentation and tutorials should be forthcoming.

My understanding is roughly thus:

So basically you can only use the await, async with and async for keywords inside a coroutine defined with async def. A coroutine can only be run inside a task inside a runloop. A runloop will schedule across multiple tasks each time the current one is suspended. A task will never be suspended inside ordinary synchronous code, but can be suspended inside any await, async with or async for statement.

That's how I understand it, anyway.

philipnbbc commented 5 years ago

I think open should call HTTPIOFile.open so that the behaviour is the same as before and it does what a user expects, e.g. raise an exception at the point if it fails to connect.

philipnbbc commented 5 years ago

Should httpio.async be renamed to httpio.asyncio (as you've done elsewhere) to avoid clashes with a reserved name?

philipnbbc commented 5 years ago

Could a .flake8 file be added? It has exposed 1 real issue.

jamesba commented 5 years ago

@philipnbbc care to take another look? This should have fixed all your comments.

philipnbbc commented 5 years ago

@jamesba the tests seem to take an awful lot longer after the new commits, i.e. after 52db224121c2abaa8a3639ddc41f64593425cd28. I can't figure out what change could be causing that.

Running tox -e py36 using the latest code now results in this:

GLOB sdist-make: /store1/external/jamesba-httpio/setup.py
py36 inst-nodeps: /store1/external/jamesba-httpio/.tox/dist/httpio-0.2.0.zip
py36 installed: aiohttp==3.5.4,async-timeout==3.0.1,attrs==19.1.0,certifi==2019.3.9,chardet==3.0.4,coverage==4.5.3,httpio==0.2.0,idna==2.8,idna-ssl==1.1.0,mock==2.0.0,multidict==4.5.2,pbr==5.1.3,requests==2.21.0,six==1.12.0,typing-extensions==3.7.2,urllib3==1.24.2,yarl==1.3.0
py36 runtests: PYTHONHASHSEED='1234660441'
py36 runtests: commands[0] | python -m unittest discover -s tests -p test*.py
Executing <Task finished coro=<TestAsyncHTTPIOFile.test_tell_starts_at_zero() done, defined at /store1/external/jamesba-httpio/tests/test_async_httpio.py:268> result=None created at /usr/lib/python3.7/asyncio/base_events.py:563> took 127.231 seconds
.Executing <Task finished coro=<test_readline() done, defined at /store1/external/jamesba-httpio/tests/test_async_httpio.py:247> result=None created at /usr/lib/python3.6/asyncio/base_events.py:452> took 127.231 seconds
.Executing <Task finished coro=<TestAsyncHTTPIOFile.test_throws_exception_when_get_returns_error() done, defined at /store1/external/jamesba-httpio/tests/test_async_httpio.py:192> result=None created at /usr/lib/python3.7/asyncio/base_events.py:563> took 127.231 seconds
.Executing <Task finished coro=<test_readlines() done, defined at /store1/external/jamesba-httpio/tests/test_async_httpio.py:254> result=None created at /usr/lib/python3.6/asyncio/base_events.py:452> took 127.230 seconds
.Executing <Task finished coro=<test_aiter() done, defined at /store1/external/jamesba-httpio/tests/test_async_httpio.py:261> result=None created at /usr/lib/python3.6/asyncio/base_events.py:452> took 127.304 seconds
...

at which point I cancelled it.

If I run it at commit 52db224121c2abaa8a3639ddc41f64593425cd28 it takes around 30 seconds.

Do you see the same thing?

jamesba commented 5 years ago

@philipnbbc I did, I thought it was a local problem with the network drive I was using. Right, time to try and do a git bisect to work out what has caused this.

philipnbbc commented 5 years ago

The changes LGTM however. The only thing I spotted that might need changing is async def close t check that self._session is not None for case where open was not called.

jamesba commented 5 years ago

Bisect identifies commit 9a3751b9e839da68361c378d7ad7abae71bc5c61 as having introduced the problem. I will take a look at cutting that commit up further on Monday.

jamesba commented 5 years ago

@philipnbbc I have identified and reverted the change that caused the timing problem. I will explore further to try and work out how it caused that, and push a version without the problem.

jamesba commented 5 years ago

@philipnbbc this should now be working

jamesba commented 5 years ago

@philipnbbc @barneygale right, all changes made, history cleaned up. Ready to merge.

philipnbbc commented 5 years ago

@jamesba LGTM.

barneygale commented 5 years ago

Thanks both