Clueless-Community / scrape-up

A web-scraping-based python package that enables you to scrape data from various platforms like GitHub, Twitter, Instagram, or any useful website.
https://pypi.org/project/scrape-up/
MIT License
251 stars 241 forks source link

Feat: Implement Multi-Language Support for Scraping #1138

Closed Ayushi-Choudhary22 closed 3 months ago

Ayushi-Choudhary22 commented 4 months ago

Describe the feature

Enhance the scraper to support multiple languages, enabling it to scrape content from non-English websites effectively. This will involve:

Detecting the language of the website. Using appropriate libraries and methods to handle different character encodings. *Adding translations for common scraping elements and error messages.

Add ScreenShots

W2MDyeo0zkf WhatsApp Image 2024-07-25 at 7 55 49 AM WhatsApp Image 2024-07-25 at 7 59 02 AM WhatsApp Image 2024-07-25 at 7 56 24 AM

Web Scraper

Overview

This project is a web scraper designed to extract and process data from websites. It is currently tested on English websites and is being enhanced to handle multi-language content seamlessly.

Current Scraper Output

English Site

Description: Screenshot showing the scraper working perfectly with an English website.

Potential Issues with Non-English Sites

Text Encoding or Parsing Issues

Description: Screenshot displaying the scraper encountering issues with text encoding or parsing on a non-English website.

Expected Output with Multi-Language Support

Mock-Up of Expected Results

Description: Mock-up of the expected output where the scraper handles multiple languages seamlessly.

Features

Record

github-actions[bot] commented 4 months ago

Hi there! Thanks for opening this issue. We appreciate your contribution to this open-source project. We aim to respond or assign your issue as soon as possible.

Ayushi-Choudhary22 commented 4 months ago

hey @nikhil25803 I’m keen to contribute to the scraper project for GSSoC 2024. I’m interested in tackling this issue #1138 Can you assign this to me? Looking forward to getting started! Thanks!