Clueless-Community / scrape-up

A web-scraping-based python package that enables you to scrape data from various platforms like GitHub, Twitter, Instagram, or any useful website.
https://pypi.org/project/scrape-up/
MIT License
252 stars 241 forks source link

Wikipedia Scraper [GSSoC'23] #257

Closed prernamittal closed 1 year ago

prernamittal commented 1 year ago

Is your feature request related to a problem? Please describe. Yes, the problem is that there is a need for a Wikipedia scraper to extract information from Wikipedia pages automatically. A Wikipedia scraper would be useful for various purposes such as data analysis, research, content aggregation, and more. By automating the process of extracting information from Wikipedia pages, users can save time and effort by retrieving the exact data they need in a structured format. This allows for efficient data collection and analysis, and enables the creation of customized datasets for specific research or application purposes.

Describe the solution you'd like The solution is to create a Wikipedia scraper that can retrieve specific information from Wikipedia pages without manual intervention. Various data extraction operations can be performed using BeautifulSoup. The code will demonstrate examples of extracting the title, introduction paragraph, headings, links, and references from the Wikipedia page. The extracted data will be printed to the console for demonstration purposes. In a real-world scenario, you could store the data in variables, write it to a file, or process it further according to your requirements.

Describe alternatives you've considered Alternative solutions would include manually copying and pasting the desired information from Wikipedia pages or relying on pre-existing datasets. However, these methods can be time-consuming, error-prone, and may not provide the flexibility to retrieve specific and up-to-date information.

I request @nikhil25803 to kindly assign this issue to me under GSSoC'23. Thanks!

nikhil25803 commented 1 year ago

Great Idea @prernamittal Note