IliaZenkov / async-pubmed-scraper

PubMed scraper for async search on a list of keywords and concurrent extraction of all found URLs, returning a DataFrame/CSV containing all article data (title, abstract, authors, affiliations, etc)
MIT License
33 stars 15 forks source link

error when running the command #3

Open yudeng2022 opened 1 year ago

yudeng2022 commented 1 year ago

Hi, Thank you so much for providing the tool! I ran into the following error messages when running the command the second time. I am wondering if you know how to fix this? Much appreciated!


Traceback (most recent call last):
  File "async_pubmed_scraper.py", line 268, in <module>
    loop.run_until_complete(build_article_urls(search_keywords))
  File "C:\Users\DENGYX3\Anaconda3\lib\asyncio\base_events.py", line 616, in run_until_complete
    return future.result()
  File "async_pubmed_scraper.py", line 215, in build_article_urls
    await asyncio.gather(*tasks)
  File "async_pubmed_scraper.py", line 171, in get_pmids
    pmids = soup.find('meta',{'name':'log_displayeduids'})['content']
TypeError: 'NoneType' object is not subscriptable
yudeng2022 commented 1 year ago

I noticed this happened when I put the number of pages too large for example: python async_pubmed_scraper --pages 10000 --start 2018 --stop 2020 --output article_data.

Is there anyway I can get all the abstracts that is related to the keywords without setting the number of pages?