adhilcodes / Annvi-classifier

Malayalam Gender Classifier - Using Machine Learning to Predict Gender of Individuals using their name
MIT License
2 stars 3 forks source link

Data collection #1

Open adhilcodes opened 2 years ago

adhilcodes commented 2 years ago

Malayalam is a highly inflectional and agglutinative language compared to other languages. And very few people seem to have applied techniques in machine learning and deep learning in Malayalam. A lot of progress in other languages has happened and there are plenty of datasets, blogs, and tutorials available in NLP for other languages. Since didn't have proper data for our task we will be manually creating the dataset. You can also use scrape the data from websites and preprocess it.

adhilcodes commented 2 years ago

I attempted to convert an English name dataset to Malayalam by writing a small python script and ended up in failure. YOu can also give it a try 😉!

Good luck👍🏻