Multithreading/parallelize?

ferru97 / PyPaperBot

PyPaperBot is a Python tool for downloading scientific papers using Google Scholar, Crossref, SciHub, and SciDB.

MIT License

410 stars 77 forks source link

Multithreading/parallelize? #5

Open Niemand112233 opened 4 years ago

Niemand112233 commented 4 years ago

Hi, Your software works great, but it is a little bit slow when searching for queries on google scholar. Is it possible to parallelize for example the search on the single pages?

ferru97 commented 4 years ago

Hi @Niemand112233, Yes for the next version (as soon as I have time or new collaborators) I'll try to introduce multithreading(even though it's not the best thing of python) at least during the downloads so that the tool will continue searching while downloading papers.

There is also a random timeout 1-10sec after each request to Crossref (When you see on the prompt "Searching paper x of y on Crossref.."). I've put it as I think that it helps to not get blocked by crossref for API abuse. On the next version, I think I'll put this timeout optional

creepy-pasta101 commented 3 years ago

I guess mutlithreading can be introduced by adding crossref query and scihub downloads alternatively

ferru97 commented 3 years ago

Yes of course, even if the gain is not maximum

ferru97 commented 1 month ago

Hi, sorry for the late reply, but PyPaperBot had been on standby until now. I've decided to resume development to improve it.

I've created a Telegram channel where you can suggest improvements, report bugs, or request custom data mining scripts. Feel free to join if you're interested with the following LINK

Thanks!