Closed carbbin closed 2 months ago
Hi @carbbin, can you post the full traceback? It looks like you've cut off the error right at the start of the actual error description.
Thanks!
Hi @SarthakJShetty,
Yes sorry for that. Here it is:
File "c:\Users.venv\Lib\site-packages\bs4__init.py", line 315, in init__ elif len(markup) <= 256 and ( ^^^^^^^^^^^ TypeError: object of type 'NoneType' has no len()
Thank you for reporting this error. Indeed, it looks like there must have been site-wide changes to Elseiver which are preventing the page retrieval. I fear that it may not be possible to even retrieve the HTML anymore. I will try some other ways to retrieve the HTML and get back to you on this. This was also reported in #11
Ok, ty very much @SarthakJShetty!
Hey @SarthakJShetty Did you manage to retrieve the HTML differently and if yes, are you planning to implenting it? I'm looking for a tool like yours atm...
Hi @SebastianLeimbacher! Thank you for trying out pyResearchInsights. I'll be taking a look today and try to get back to you. Apologies for the delay, the situation looks to a bit more tricky than I anticipated at first :sweat:
Hi @SarthakJShetty Do you have any update?
Hi @devans18 and @SebastianLeimbacher and @sustianovich
Sorry for the delay, but I've finally figured this out. I will build and post a new package in a new hours and get back on this issue with an update.
Thank you for being patient. v1.60 release should solve this issue. Feel free to reopen this page if you still run into this issue.
Hi!
I am working with python 3.11.7.
I created a virtual enviroment and also after the error installed lower versions of beautifulsoup4==4.12.2 and bs4==0.0.1 to see if it was because of this.
It creates me the LOGS folder and i already have the NLTK_DATA folder.
What can be the error?
The error: ##################
[nltk_data] Downloading package stopwords to [nltk_data] C:\Users\AppData\Roaming\nltk_data... [nltk_data] Package stopwords is already up-to-date! [INFO]12:20:32 Built LOG folder for session [INFO]12:20:32 https://link.springer.com/search/page/ start_url has been received [INFO]12:20:32 https://link.springer.com/search/page/0?facet-content-type="Article"&query=Western+Ghats+Conservation&facet-language="En" has been obtained Traceback (most recent call last): File "c:\Users\vscode\pyresearch\data.py", line 16, in
scraper_main(keywords_to_search, abstracts_log_name, status_logger_name)
File "c:\Users\vscode\pyresearch.venv\Lib\site-packages\pyResearchInsights\Scraper.py", line 396, in scraper_main
urls_to_scrape = url_generator(start_url, query_string, status_logger_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\vscode\pyresearch.venv\Lib\site-packages\pyResearchInsights\Scraper.py", line 65, in url_generator
test_soup = bs(url_reader(total_url, status_logger_name), 'html.parser')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\vscode\pyresearch.venv\Lib\site-packages\bs4__init.py", line 315, in init__
elif len(markup) <= 256 and (
##################
The code: ################## from pyResearchInsights.common_functions import pre_processing from pyResearchInsights.Scraper import scraper_main
'''Abstracts containing these keywords will be queried from Springer''' keywords_to_search = "Western Ghats Conservation"
'''Calling the pre_processing functions here so that abstracts_log_name and status_logger_name is available across the code.''' abstracts_log_name, status_logger_name = pre_processing(keywords_to_search)
'''Runs the scraper here to scrape the details from the scientific repository''' scraper_main(keywords_to_search, abstracts_log_name, status_logger_name)
##################