Closed Hardeepex closed 11 months ago
7ab651760d
)[!TIP] I'll email you at hardeep.ex@gmail.com when I complete this pull request!
Here are the sandbox execution logs prior to making any changes:
63f2a98
Checking src/webscraper.py for syntax errors... ✅ src/webscraper.py has no syntax errors!
1/1 ✓Checking src/webscraper.py for syntax errors... ✅ src/webscraper.py has no syntax errors!
Sandbox passed on the latest main
, so sandbox checks will be enabled for this issue.
I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.
src/webscraper.py
✓ https://github.com/Hardeepex/webscraper/commit/7d65b1bbd14d1d0395fefd997d8bdcea064c6f15 Edit
Modify src/webscraper.py with contents:
• Import the get_webdriver function from the selenium_grid.py script at the top of the webscraper.py script.
• Replace the requests.get(url) line in the scrape function with a call to the get_webdriver function to get a WebDriver object.
• Use the get method of the WebDriver object to send a GET request to the specified URL.
• Replace the BeautifulSoup(response.text, 'html.parser') line with a call to the page_source property of the WebDriver object to get the HTML source of the webpage.
• Add a try-except block around the get method call to catch any exceptions that may be raised if the WebDriver fails to get the webpage. In the except block, print an error message that includes instructions for building and running a Selenium Grid Docker container, as specified in the README.md file.
• After getting the HTML source of the webpage, use BeautifulSoup to parse the HTML source.
--- +++ @@ -1,10 +1,20 @@ import requests +from src.selenium_grid import get_webdriver from bs4 import BeautifulSoup def scrape(url): - response = requests.get(url) - soup = BeautifulSoup(response.text, 'html.parser') + try: + driver = get_webdriver() + driver.get(url) + page_source = driver.page_source + except Exception as e: + print("Failed to get page using WebDriver. Instructions for building and running a" + " Selenium Grid Docker container can be found in the README.md file.") + print(str(e)) + return None + else: + soup = BeautifulSoup(page_source, 'html.parser') return soup if __name__ == "__main__":
src/webscraper.py
✓ Edit
Check src/webscraper.py with contents:
Ran GitHub Actions for 7d65b1bbd14d1d0395fefd997d8bdcea064c6f15:
I have finished reviewing the code for completeness. I did not find errors for sweep/tried_to_run_the_scraper_but_got_the_err
.
💡 To recreate the pull request edit the issue title or description. To tweak the pull request, leave a comment on the pull request. Join Our Discord
python3 src/webscraper.py
Access Denied
You don't have permission to access "http://www.rei.com/" on this server.Reference #18.140a7c68.1704091323.387bf998
~/WebstormProjects/forbes
Checklist
- [X] Modify `src/webscraper.py` ✓ https://github.com/Hardeepex/webscraper/commit/7d65b1bbd14d1d0395fefd997d8bdcea064c6f15 [Edit](https://github.com/Hardeepex/webscraper/edit/sweep/tried_to_run_the_scraper_but_got_the_err/src/webscraper.py) - [X] Running GitHub Actions for `src/webscraper.py` ✓ [Edit](https://github.com/Hardeepex/webscraper/edit/sweep/tried_to_run_the_scraper_but_got_the_err/src/webscraper.py)