Closed Ayushi-Choudhary22 closed 3 months ago
Hi there! Thanks for opening this issue. We appreciate your contribution to this open-source project. We aim to respond or assign your issue as soon as possible.
hey @nikhil25803 I’m keen to contribute to the scraper project for GSSoC 2024. I’m interested in tackling this issue #1138 Can you assign this to me? Looking forward to getting started! Thanks!
Describe the feature
Enhance the scraper to support multiple languages, enabling it to scrape content from non-English websites effectively. This will involve:
Detecting the language of the website. Using appropriate libraries and methods to handle different character encodings. *Adding translations for common scraping elements and error messages.
Add ScreenShots
Web Scraper
Overview
This project is a web scraper designed to extract and process data from websites. It is currently tested on English websites and is being enhanced to handle multi-language content seamlessly.
Current Scraper Output
English Site
Description: Screenshot showing the scraper working perfectly with an English website.
Potential Issues with Non-English Sites
Text Encoding or Parsing Issues
Description: Screenshot displaying the scraper encountering issues with text encoding or parsing on a non-English website.
Expected Output with Multi-Language Support
Mock-Up of Expected Results
Description: Mock-up of the expected output where the scraper handles multiple languages seamlessly.
Features
Record