Clueless-Community / scrape-up

A web-scraping-based python package that enables you to scrape data from various platforms like GitHub, Twitter, Instagram, or any useful website.
https://pypi.org/project/scrape-up/
MIT License
243 stars 248 forks source link

Scraping support for Reddit [GSSoC'23] #173

Closed Prachi-Jain01 closed 12 months ago

Prachi-Jain01 commented 1 year ago

Proposed Method:

Create a wrapper over Reddit API to add support for scraping Reddit.

Directory:

scrape-up/src/scrape_up/reddit

I would like to work on this issue as a part of GSSoC'23. @nikhil25803 Could you please assign it to me?

Prachi-Jain01 commented 1 year ago

@nikhil25803 Could you please assign this issue to me?

shubham725809 commented 1 year ago

please assign this to me

nikhil25803 commented 1 year ago

Sure @Prachi-Jain01, go give it a try :))

BabarRasheed commented 1 year ago

Hi, I'm Babar Rasheed (Contributor GSSOC'23) Many websites don't offer API so to tackle this we can use Web Scraping to access data in easy and structured manner. Python libraries like bs4, BeautifulSoup, Scrapy, Selenium, etc. are generally used for web scraping. Here I'm willing to apply these libraries and use an effective way of Multiprocessing to speed up Web Scraping. Multiprocessing is very helpful when multiple URLs are scraped to get the data. It will perform scraping on multiple URLs thus saving our time.