Amazon India Scraper Support [Gssoc'23]

Clueless-Community / scrape-up

A web-scraping-based python package that enables you to scrape data from various platforms like GitHub, Twitter, Instagram, or any useful website.

https://pypi.org/project/scrape-up/

MIT License

249 stars 243 forks source link

Amazon India Scraper Support [Gssoc'23] #203

Closed paritoshtripathi935 closed 1 year ago

paritoshtripathi935 commented 1 year ago

Amazon Scraper Support based on Categories or Product ID

can help in collecting data graph based analysis.
can help in product matching.
can help in price tracking for products.

Solution

A module where you pass categories and data in returned in form of csv, json etc .
can include scraping of particular product details basis of SKU id provided.
support for multi threading for high speed scraping.
Fallback mechanism like selenium after blocking occurs.

-- please allow me to work on this issue under Gssoc'23.

nikhil25803 commented 1 year ago

Great idea, go ahead. Note

Create a separate module for this, as per the folder and project structure
Do not add all methods at once, do add one or two first-like methods to all mobiles or laptops available on amazon.

All the best

paritoshtripathi935 commented 1 year ago

thanks @nikhil25803 on it

Abhinavcode13 commented 1 year ago

can I too work on this issue @nikhil25803

paritoshtripathi935 commented 1 year ago

@nikhil25803 i have pulled a pr please have a look.

BabarRasheed commented 1 year ago

Hi, I'm Babar Rasheed (Contributor GSSOC'23) Many websites don't offer API so to tackle this we can use Web Scraping to access data in easy and structured manner. Python libraries like bs4, BeautifulSoup, Scrapy, Selenium, etc. are generally used for web scraping. Here I'm willing to apply these libraries and use an effective way of Multiprocessing to speed up Web Scraping. Multiprocessing is very helpful when multiple URLs are scraped to get the data. It will perform scraping on multiple URLs thus saving our time.

paritoshtripathi935 commented 1 year ago

@BabarRasheed multiprocessing is already implemented please have a look on pr i opened, you can suggest any further improvements on that?

nikhil25803 commented 1 year ago

Hey @paritoshtripathi935 | A class has already been created on this #307 , please add your method using the same class. Do not create new.

paritoshtripathi935 commented 1 year ago

okay @nikhil25803

nikhil25803 commented 1 year ago

Hey @Abhinavcode13, @BabarRasheed | You guys can work on the same class created in #307 if you want :))