TraMiu Bahnar-Dataset issues - Githubissues

TraMiu / Bahnar-Dataset

A dataset for Low Resource Machine Translation in Bahnar

0 stars 0 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Preprocess

#11 TraMiu closed 10 months ago
0
find the monolingual data structure in the low resource machine translation

#10 TraMiu opened 11 months ago
0
Extract the parallel corpus of Eng-Bahnar from Bible

#9 TraMiu opened 11 months ago
1
explore the monoligual text and how to implement it

#8 TraMiu opened 11 months ago
0
Learn about how to combine fastBPE with underthesea word segmentation

#7 TraMiu opened 11 months ago
0
Learn about BLEU score and how to implement it into our model

#6 TraMiu opened 11 months ago
2
Learning about few-shot learning and implement it

#5 TraMiu opened 11 months ago
1
tokenize vietnamese using underthesea library and tokenize Bahnar based on dictionary

#4 TraMiu closed 11 months ago
0
Clean data (remove punctuation, digits, and lower case)

#3 TraMiu closed 11 months ago
0
Check dataset and add more if not efficient

#2 TraMiu opened 11 months ago
0
Add raw dataset to the repository with syntax [domain_number_language_asignee.txt]

#1 TraMiu closed 11 months ago
0