darija-open-dataset / dataset

darija <-> english dataset
Other
281 stars 99 forks source link

more data in different semantic categories #64

Closed moun3imy closed 1 year ago

moun3imy commented 1 year ago

I added more data in different semantic categories. Apparently, I can be useful for this repository since I remarked the absence of oriental darija (oujda, berkane, guercif, nador,...etc) and I am from this region. I will be adding words from this variant where they're missing.

darija-open-dataset commented 1 year ago

Hey Moun3im, Yeah that would be great. Thanks. Regarding this PR, for the dataset to be consistent, it would be better if you could change few lines to meet guideline number 11 (https://github.com/darija-open-dataset/dataset#guidelines--recommendations)

For example, "sa3ada","saaada","lfer7a","","happiness" should be separated into two lines: "sa3ada","saaada","","","happiness" "lfer7a","","","","happiness"

Thanks again :)

moun3imy commented 1 year ago

Hi @darija-open-dataset, Thanks for pointing out the guideline. I fixed that issue.

darija-open-dataset commented 1 year ago

Great. Thank you!