Hardeepex / webscrapers

0 stars 0 forks source link

Sweep: create a web Scraper using Scrapy, selectorlax and selenium grid #3

Closed Hardeepex closed 8 months ago

Hardeepex commented 8 months ago

I want to create a web scraper using scrapy and selectorlax and selenium grid. You can use the docker file for selenium grid. Test with scrape amazon reviews of products

Checklist - [X] Create `scrapy_project/amazon_reviews/AmazonReviewsSpider.py` ✓ https://github.com/Hardeepex/webscrapers/commit/efa8827ddfb65cf658adb3f66b998d564043989e [Edit](https://github.com/Hardeepex/webscrapers/edit/sweep/create_a_web_scraper_using_scrapy_select/scrapy_project/amazon_reviews/AmazonReviewsSpider.py) - [X] Create `Dockerfile` ✓ https://github.com/Hardeepex/webscrapers/commit/82bee3dd4b598beb44352c99a19930fb9b835f95 [Edit](https://github.com/Hardeepex/webscrapers/edit/sweep/create_a_web_scraper_using_scrapy_select/Dockerfile) - [X] Create `scrapy_project/settings.py` ✓ https://github.com/Hardeepex/webscrapers/commit/d8ddee315f5d924d51782339afd1f0d715444bd2 [Edit](https://github.com/Hardeepex/webscrapers/edit/sweep/create_a_web_scraper_using_scrapy_select/scrapy_project/settings.py#L1-L100)
sweep-ai[bot] commented 8 months ago

🚀 Here's the PR! #4

See Sweep's progress at the progress dashboard!
💎 Sweep Pro: I'm using GPT-4. You have unlimited GPT-4 tickets. (tracking ID: f890637c2a)

[!TIP] I'll email you at hardeep.ex@gmail.com when I complete this pull request!


Actions (click)

Sandbox execution failed

The sandbox appears to be unavailable or down.


Step 1: 🔎 Searching

I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.

Some code snippets I think are relevant in decreasing order of relevance (click to expand). If some file is missing from here, you can mention the path in the ticket description. https://github.com/Hardeepex/webscrapers/blob/c1a8c5ee83eeb320aebf26d7dd5432555e4c1a6a/yielding-results/README.md#L1-L6 https://github.com/Hardeepex/webscrapers/blob/c1a8c5ee83eeb320aebf26d7dd5432555e4c1a6a/README.md#L1-L4 https://github.com/Hardeepex/webscrapers/blob/c1a8c5ee83eeb320aebf26d7dd5432555e4c1a6a/cli-scraper/README.md#L1-L3 https://github.com/Hardeepex/webscrapers/blob/c1a8c5ee83eeb320aebf26d7dd5432555e4c1a6a/scrapercli/README.md#L1-L3 https://github.com/Hardeepex/webscrapers/blob/c1a8c5ee83eeb320aebf26d7dd5432555e4c1a6a/synctoasync/README.md#L1-L4 https://github.com/Hardeepex/webscrapers/blob/c1a8c5ee83eeb320aebf26d7dd5432555e4c1a6a/caching/main.py#L1-L31 https://github.com/Hardeepex/webscrapers/blob/c1a8c5ee83eeb320aebf26d7dd5432555e4c1a6a/data-in-script-tags/README.md#L1-L6

Step 2: ⌨️ Coding


Step 3: 🔁 Code Review

I have finished reviewing the code for completeness. I did not find errors for sweep/create_a_web_scraper_using_scrapy_select.


🎉 Latest improvements to Sweep:


💡 To recreate the pull request edit the issue title or description. To tweak the pull request, leave a comment on the pull request. Join Our Discord