elliotgao2 / gain

Web crawling framework based on asyncio.
GNU General Public License v3.0
2.04k stars 207 forks source link

The ``sciencenet_spider.py`` example does not (seem to) work for python 3.6 #42

Open endafarrell opened 6 years ago

endafarrell commented 6 years ago

I copied the examples/sciencenet_spider.py example and tried to run it using python 3.6 - but:

python sciencenet_spider.py
[2018:04:14 22:21:26] Spider started!
[2018:04:14 22:21:26] Using selector: KqueueSelector
[2018:04:14 22:21:26] Base url: http://blog.sciencenet.cn/
[2018:04:14 22:21:26] Item "Post": 0
[2018:04:14 22:21:26] Requests count: 0
[2018:04:14 22:21:26] Error count: 0
[2018:04:14 22:21:26] Time usage: 0:00:00.001127
[2018:04:14 22:21:26] Spider finished!
Traceback (most recent call last):
  File "sciencenet_spider.py", line 19, in <module>
    MySpider.run()
  File "/Users/endafarrell/anaconda/anaconda3/lib/python3.6/site-packages/gain/spider.py", line 52, in run
    loop.run_until_complete(cls.init_parse(semaphore))
  File "/Users/endafarrell/anaconda/anaconda3/lib/python3.6/asyncio/base_events.py", line 467, in run_until_complete
    return future.result()
  File "/Users/endafarrell/anaconda/anaconda3/lib/python3.6/site-packages/gain/spider.py", line 71, in init_parse
    with aiohttp.ClientSession() as session:
  File "/Users/endafarrell/anaconda/anaconda3/lib/python3.6/site-packages/aiohttp/client.py", line 746, in __enter__
    raise TypeError("Use async with instead")
TypeError: Use async with instead
sys:1: RuntimeWarning: coroutine 'Parser.task' was never awaited
[2018:04:14 22:21:26] Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x105b07cf8>

My python is

python
Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 12:04:33)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin

and I have:

pip list | grep gain
gain                               0.1.4

I installed gain using:

pip install gain

Any ideas?

endafarrell commented 6 years ago

Similar for python 3.5:

python sciencenet_spider.py
[2018:04:14 22:32:58] Spider started!
[2018:04:14 22:32:58] Using selector: KqueueSelector
[2018:04:14 22:32:58] Base url: http://blog.sciencenet.cn/
[2018:04:14 22:32:58] Item "Post": 0
[2018:04:14 22:32:58] Requests count: 0
[2018:04:14 22:32:58] Error count: 0
[2018:04:14 22:32:58] Time usage: 0:00:00.001171
[2018:04:14 22:32:58] Spider finished!
Traceback (most recent call last):
  File "sciencenet_spider.py", line 19, in <module>
    MySpider.run()
  File "/Users/endafarrell/anaconda/anaconda3/envs/py35/lib/python3.5/site-packages/gain/spider.py", line 52, in run
    loop.run_until_complete(cls.init_parse(semaphore))
  File "/Users/endafarrell/anaconda/anaconda3/envs/py35/lib/python3.5/asyncio/base_events.py", line 467, in run_until_complete
    return future.result()
  File "/Users/endafarrell/anaconda/anaconda3/envs/py35/lib/python3.5/asyncio/futures.py", line 294, in result
    raise self._exception
  File "/Users/endafarrell/anaconda/anaconda3/envs/py35/lib/python3.5/asyncio/tasks.py", line 240, in _step
    result = coro.send(None)
  File "/Users/endafarrell/anaconda/anaconda3/envs/py35/lib/python3.5/site-packages/gain/spider.py", line 71, in init_parse
    with aiohttp.ClientSession() as session:
  File "/Users/endafarrell/anaconda/anaconda3/envs/py35/lib/python3.5/site-packages/aiohttp/client.py", line 746, in __enter__
    raise TypeError("Use async with instead")
TypeError: Use async with instead
sys:1: RuntimeWarning: coroutine 'Parser.task' was never awaited
[2018:04:14 22:32:58] Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x107259e48>

When python is:

python
Python 3.5.5 |Anaconda, Inc.| (default, Mar 12 2018, 16:25:05)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
solarhell commented 6 years ago

Check this line https://github.com/gaojiuli/gain/commit/2c8160c92943837613a773f681fb190a8c434bb2#diff-6b9bdc895398e257e454fa60948dba08R69 Just clone the latest code... It seems that author didn't release the latest version to pypi @gaojiuli

endafarrell commented 6 years ago

Hi @solarhell - many thanks. After I clone'd the latest code (and then added my local-subdir gain to the sys.path), the example works.

@gaojiuli - I'd love to know when pypi is updated!

chenlei9907 commented 5 years ago

same issue in python3.6

[2019:01:30 10:04:19] Spider started! [2019:01:30 10:04:19] Base url: http://blog.sciencenet.cn/ [2019:01:30 10:04:19] Item "Post": 0 [2019:01:30 10:04:19] Requests count: 0 [2019:01:30 10:04:19] Error count: 0 [2019:01:30 10:04:19] Time usage: 0:00:00.000988 [2019:01:30 10:04:19] Spider finished! Traceback (most recent call last): File "sciencenet_spider.py", line 19, in MySpider.run() File "/Users/leichen/anaconda3/lib/python3.6/site-packages/gain/spider.py", line 52, in run loop.run_until_complete(cls.init_parse(semaphore)) File "uvloop/loop.pyx", line 1451, in uvloop.loop.Loop.run_until_complete File "/Users/leichen/anaconda3/lib/python3.6/site-packages/gain/spider.py", line 71, in init_parse with aiohttp.ClientSession() as session: File "/Users/leichen/anaconda3/lib/python3.6/site-packages/aiohttp/client.py", line 956, in enter raise TypeError("Use async with instead") TypeError: Use async with instead [2019:01:30 10:04:19] Unclosed client session client_session: <aiohttp.client.ClientSession object at 0x111d2e940> sys:1: RuntimeWarning: coroutine 'Parser.task' was never awaited

OS: Mac Darwin Kernel Version 18.2.0 Python Python 3.6.3 :: Anaconda custom (64-bit) install gain via pip install gain

AnyIdea ?

rdidyk commented 5 years ago

Just install it via pip install -U -e git+https://github.com/gaojiuli/gain.git#egg=gain