Clueless-Community / scrape-up

A web-scraping-based python package that enables you to scrape data from various platforms like GitHub, Twitter, Instagram, or any useful website.
https://pypi.org/project/scrape-up/
MIT License
251 stars 245 forks source link

Bug: Redevelopment of the Quora scrapper #918

Closed Saurabh254 closed 1 month ago

Saurabh254 commented 4 months ago

Describe the feature

As an GSSoC'24 contributer, I want to enhance my developing skills into this scrape-up. also I'll be working in this issue, point to be noted I'm the contributor of the python package pyquora (quora scrapper).

I'm working on this because pyquora lacks some features like fetch get Answers by search query.

also I would like to make the scrap-up quora scrapper better.

the part I'll be covering will be

Add ScreenShots

will cover every details from the bellow image 2024-05-12-21:34:49-screenshot

will also cover the top answers

2024-05-12-21:35:45-screenshot

Record

viththagi commented 4 months ago

hi @Saurabh254 i would like to work on this issue my steps would be:

1.web scraping using libraries such as beautifulsoup,selenium 2.Understand the Website Structure to inspect the HTML of the comments section. 3.efficiency consideration: Add delays between requests to avoid overloading the website's server. Handle pagination 4.sentiment analysis: using libraries like TextBlob or NLTK.

Saurabh254 commented 4 months ago

@nikhil25803 you can assign me this task. :)

Saurabh254 commented 4 months ago

hi @Saurabh254 i would like to work on this issue my steps would be:

1.web scraping using libraries such as beautifulsoup,selenium 2.Understand the Website Structure to inspect the HTML of the comments section. 3.efficiency consideration: Add delays between requests to avoid overloading the website's server. Handle pagination 4.sentiment analysis: using libraries like TextBlob or NLTK.

we don't have to use selenium because not every system supports it. I rather be using regex to scrap the json.

nikhil25803 commented 4 months ago

Go ahead @Saurabh254

Note

All the best 👨‍💻