ferru97 / PyPaperBot

PyPaperBot is a Python tool for downloading scientific papers using Google Scholar, Crossref, SciHub, and SciDB.
MIT License
405 stars 76 forks source link

Year and authors from scholar. Minor code fixes. #25

Closed kir-malishev closed 3 years ago

kir-malishev commented 3 years ago

Not all papers have bibtex, but the publication year of the paper can be obtained from the scholar search results. In the same way, you can get a list of authors, but this is not always suitable, since some authors may not be displayed. I have implemented parsing of this data. And I corrected the code a little. I added using the standard csv module. It should be noted that in the case of empty strings, the field is now filled with an empty string, not "None". If you think it is important to maintain the same behavior, then it is worth adding these checks.

It might be worth switching from BeautifulSoup to Selenium. This library can probably help you get the citation of an article from a popup: https://scholar.google.ru/scholar?hl=ru&as_sdt=0%2C5&q=video&btnG=#d=gs_cit&u=%2Fscholar%3Fq%3Dinfo%3AZgCIxesZTisJ%3Ascholar.google.com%2F%26output%3Dcite%26scirp%3D0%26hl%3D

ferru97 commented 3 years ago

Hi!

Thanks again for the contributions, this evening I'll review all the pull requests. Yes using selenium should be more efficient even if more invasive. As soon as I have some time I see how feasible it is to integrate it

kir-malishev commented 3 years ago

Please, review it. These changes also fix the TypeError bug.

https://github.com/ferru97/PyPaperBot/blob/ee5b5020cb39396835ae7805483c3516169f3268/PyPaperBot/Scholar.py#L21