ToluClassics / Low-Resource-NLP-Tutorials

Getting started in NLP for low resource languages
1 stars 0 forks source link

Learning Word Embeddings for African Languages #1

Open ToluClassics opened 1 year ago

ToluClassics commented 1 year ago

The goal here is to create a Word2vec (CBOW and SkipGram) Colab tutorial to learn word representations for African languages. We would start with English and then migrate to other languages like Yoruba, Igbo, Hausa, Swahili e.t.c. All that needs to be changed would be the corpora anyway.

We should only use datasets from huggingface:: here's one https://huggingface.co/datasets/mc4

LMK if you need any help along the way

Seun-Ajayi commented 1 year ago

Alright boss, I would let you know when I need help.

Seun-Ajayi commented 1 year ago

@ToluClassics
Boss, I've gone through the materials you put up there, I would be going through them again for better assimilation. Just thought to update you. Many thanks.

And can I get to know what's next?

ToluClassics commented 1 year ago

I added a skeleton notebook for the word2vec tutorial; that is what we need to flesh out::

the idea is to learn the word embeddings and visualize them like they did here or here

Seun-Ajayi commented 1 year ago

Am I to go ahead and attempt something rough with the dataset from huggingface after mastering the concept? I mean, that's a good way to evaluate my understanding of the concept.

ToluClassics commented 1 year ago

Oh you don't have to master it; just do something we'll review and take it from there