abhisheks008 / DL-Simplified

Deep Learning Simplified is an Open-source repository, containing beginner to advance level deep learning projects for the contributors, who are willing to start their journey in Deep Learning. Devfolio URL, https://devfolio.co/projects/deep-learning-simplified-f013
https://quine.sh/repo/abhisheks008-DL-Simplified-499023976
MIT License
360 stars 302 forks source link

Language Detection #335

Open yashhibare7 opened 1 year ago

yashhibare7 commented 1 year ago

Deep Learning Simplified Repository (Proposing new issue)

:red_circle: Title :Language Detection
:red_circle: Dataset :kaggle
:red_circle: Approach : To detect the language of a word or sentence in Python, you can follow these steps: 1. Preprocess the input by removing punctuation and converting it to lowercase. 2. Tokenize the input into words or characters. 3. Use a language detection library like langdetect or textblob to identify the language based on statistical models.


📍 Follow the Guidelines to Contribute in the Project :


:red_circle::yellow_circle: Points to Note :


:white_check_mark: To be Mentioned while taking the issue :


Happy Contributing 🚀

All the best. Enjoy your open source journey ahead. 😎

abhisheks008 commented 1 year ago

Please mention the all the details about the project @yashhibare7

yashhibare7 commented 1 year ago

Details mentioned

abhisheks008 commented 1 year ago

This is a deep learning project repository, we expect contributors will come up deep learning methods to solve the problem statements. Please modify your approach and come up again with the new approach by including deep learning methods in it. @yashhibare7

ShatilKhan commented 11 months ago

Please assign me @abhisheks008

abhisheks008 commented 11 months ago

Can you please share your approach on how will you solve this issue, what are the models you will use? @ShatilKhan

Soumiksb06 commented 9 months ago

Full name : Soumik Banerjee GitHub Profile Link : https://github.com/Soumiksb06 Email ID: soumikbanerjee230@gmail.com Approach: I will use langid library for faster language detection. And also I'll try and research about other libraries available and choose the best one.

Hi, Abhishek, I'm completely new to Open Source but have lots of experience in building DL and ML models for prediction and I've also worked with Speech detection, Emotion detection before. I feel that this project would be a suitable start for my Open Source journey. Please assign this one to me. I'm contributor of SWOC 2024. Already completed and merged one issue! Kindly assign this to me. Thank You!

abhisheks008 commented 9 months ago

Let the program start officially. Issues will be assigned after that. Till then go through the repository as well as the README file.

Axikop commented 9 months ago

Full name : Aditya Kumar Singh GitHub Profile Link : https://github.com/Axikop Email ID :adi2003dps@gmail.com Participant ID (if applicable): Approach for this Project :To create a robust deep learning model for this i will choose a suitable dataset from kaggle that is in multiple languages and then I will preprocess it to remove the noises , converting to lowercase and handling links etc. I will also tokenize it and applying padding that will make sure that the neural network will get consistent input length. Now for the model architecture i will be using a Recurrent Neural Network because they excel at capturing the sequential nature of language, understanding how words relate to each other in a sentence and preserving context. What is your participant role? Social Winter of code 2024

Axikop commented 9 months ago

Please assign me this issue @abhisheks008

Axikop commented 9 months ago

please reply @abhisheks008

abhisheks008 commented 9 months ago

Use at least 2-3 deep learning models/methods for this project for developing the models and compare them based on the accuracy scores to find out the best fitted model. Issue assigned to you @Axikop

YashSachan2 commented 4 months ago

Full name : Yash Sachan GitHub Profile Link : https://github.com/YashSachan2 Email ID: yash.sachan.ece22@itbhu.ac.in Approach: I will use the kaggle language detection dataset(https://www.kaggle.com/datasets/basilb2s/language-detection) having 17 languages and then peform data preprocessing using nltk library and then perform tokenization and vectorisation and then train it by fine tuning pre trained models is BERT,distillbert,pre trained models from huggingface like LLAma. What is your participant role? GSSoc'24 Please assign me this issue @abhisheks008

abhisheks008 commented 4 months ago

Hi @YashSachan2 nice to have you here again! You can start working in this issue. Assigned to you.