Closed NadieFiind closed 3 years ago
Hello @NadieFiind o7 Thank you very much for your contrib.
Why don't we use ensure_future to build asynchronous tasks instead building an async generator?
something like:
async def search_pages(self, query: str, sort: str=None, max_pages: int=1) -> List[SearchPage]:
TASKS = []
for page in range(1, max_pages + 1):
task = asyncio.ensure_future(self.search(query=query, sort=sort, page=page))
TASKS.append(task)
return await asyncio.gather(*TASKS)
I did some testing to test which is faster. Obviously the ensure future is faster but it might hurt your network because it is making many requests at the same time.
import time
import asyncio
from NHentai.nhentai_async import NHentaiAsync
async def main():
pages = 2
nhentai = NHentaiAsync()
print(f"Pages: {pages}")
# test the speed of async generator
start_time = time.time()
async for page in nhentai.search_pages(query="a", max_pages=pages):
pass
print(f"Async Generator: {time.time() - start_time}")
# test the speed of ensure future
start_time = time.time()
for page in await nhentai.list_search_pages(query="a", max_pages=pages):
pass
print(f"Ensure Future : {time.time() - start_time}")
asyncio.run(main())
Pages: 10
Async Generator: 9.494192361831665
Traceback (most recent call last):
File "/home/nadie/MyFiles/Others/Playground/NHentai-API/run.py", line 21, in <module>
asyncio.run(main())
File "/usr/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/home/nadie/MyFiles/Others/Playground/NHentai-API/run.py", line 17, in main
for page in await nhentai.list_search_pages(query="a", max_pages=pages):
File "/home/nadie/MyFiles/Others/Playground/NHentai-API/NHentai/nhentai_async.py", line 210, in list_search_pages
return await asyncio.gather(*TASKS)
File "/home/nadie/MyFiles/Others/Playground/NHentai-API/NHentai/nhentai_async.py", line 162, in search
total_results = soup.find('div', id='content').find('h1').text.strip().split()[0]
AttributeError: 'NoneType' object has no attribute 'find'
Pages: 2
Async Generator: 1.8263447284698486
Ensure Future : 1.04958176612854
Pages: 2
Async Generator: 1.7395751476287842
Ensure Future : 0.6928927898406982
As you can see it worked fine with smaller number of pages.
I have no idea why the soup is returning None
with bigger number of pages.
I added a print in the NHentaiAsync.search
method to see which soups are actually returning None
and this is what I got:
Pages: 10
Not None
Not None
Not None
Not None
Not None
Not None
Not None
Not None
Not None
Not None
Async Generator: 9.730764865875244
None
None
None
Not None
Not None
Not None
Not None
Not None
Not None
Not None
Ensure Future : 1.2536113262176514
I'll investigate why the search method sometimes are returning None
.
Thanks for the report.
I'm working to migrate the api from a common webscrapper to an nhentai api wrapper. It will improve the consistency of the methods result.
About the ensure_future vs async generator. I understood your points. I agree with you about ur changes to use the async generator. Let's continue with this strategy.
୧☉□☉୨
I made a new commit. Please read the description.
I played around with the concurrent_tasks
argument. Under 7 concurrent tasks I don't get any None
error. But with 7 and above concurrent tasks, that's where I start getting None
error.
Get multiple search pages at once with max number of pages using an async generator.