Closed Hardeepex closed 8 months ago
f890637c2a
)[!TIP] I'll email you at hardeep.ex@gmail.com when I complete this pull request!
The sandbox appears to be unavailable or down.
I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.
scrapy_project/amazon_reviews/AmazonReviewsSpider.py
✓ https://github.com/Hardeepex/webscrapers/commit/efa8827ddfb65cf658adb3f66b998d564043989e Edit
Create scrapy_project/amazon_reviews/AmazonReviewsSpider.py with contents:
• Create a new Scrapy spider in the Scrapy project directory. Name it AmazonReviewsSpider.py.
• Import Scrapy, Selenium, and selectorlax at the top of the file.
• Define a new Scrapy spider class named AmazonReviewsSpider. This class should inherit from scrapy.Spider.
• Define the name of the spider as 'amazon_reviews'.
• Define the start_urls attribute as a list containing the URL of the Amazon product whose reviews you want to scrape.
• Define a parse method that will be called with the response from the start_urls. This method should use Selenium to load the dynamic content and selectorlax to parse the HTML and extract the reviews.
Dockerfile
✓ https://github.com/Hardeepex/webscrapers/commit/82bee3dd4b598beb44352c99a19930fb9b835f95 Edit
Create Dockerfile with contents:
• Create a new Dockerfile in the root directory of the repository.
• Write the necessary commands to set up a Selenium Grid in the Dockerfile. This should include pulling the necessary Selenium images and setting up the Selenium Grid.
scrapy_project/settings.py
✓ https://github.com/Hardeepex/webscrapers/commit/d8ddee315f5d924d51782339afd1f0d715444bd2 Edit
Create scrapy_project/settings.py with contents:
• Import the necessary middleware for Selenium at the top of the file.
• Add the Selenium middleware to the DOWNLOADER_MIDDLEWARES setting.
• Add the necessary settings for selectorlax.
I have finished reviewing the code for completeness. I did not find errors for sweep/create_a_web_scraper_using_scrapy_select
.
💡 To recreate the pull request edit the issue title or description. To tweak the pull request, leave a comment on the pull request. Join Our Discord
I want to create a web scraper using scrapy and selectorlax and selenium grid. You can use the docker file for selenium grid. Test with scrape amazon reviews of products
Checklist
- [X] Create `scrapy_project/amazon_reviews/AmazonReviewsSpider.py` ✓ https://github.com/Hardeepex/webscrapers/commit/efa8827ddfb65cf658adb3f66b998d564043989e [Edit](https://github.com/Hardeepex/webscrapers/edit/sweep/create_a_web_scraper_using_scrapy_select/scrapy_project/amazon_reviews/AmazonReviewsSpider.py) - [X] Create `Dockerfile` ✓ https://github.com/Hardeepex/webscrapers/commit/82bee3dd4b598beb44352c99a19930fb9b835f95 [Edit](https://github.com/Hardeepex/webscrapers/edit/sweep/create_a_web_scraper_using_scrapy_select/Dockerfile) - [X] Create `scrapy_project/settings.py` ✓ https://github.com/Hardeepex/webscrapers/commit/d8ddee315f5d924d51782339afd1f0d715444bd2 [Edit](https://github.com/Hardeepex/webscrapers/edit/sweep/create_a_web_scraper_using_scrapy_select/scrapy_project/settings.py#L1-L100)