darija-open-dataset / dataset

darija <-> english dataset
Other
281 stars 99 forks source link

typos in sentence.csv #65

Closed fadouaabdoul closed 7 months ago

fadouaabdoul commented 1 year ago

I've been looking through the sentence.csv and found out that they are some typos in some sentences like using "8" as " h" or "7", and also they are some sentences that can have a variant of translated sentences for example: " kaychrab kass dyal lma" as " kaychrb kass dlma" In addition, the use of "x" instead of "ch".

image

darija-open-dataset commented 1 year ago

Thanks Fadoua, will check that

fadouaabdoul commented 1 year ago

If it is possible I can contribute to this repo by fixing these kind of typos, if so can we discuss further more about the second problem of having variant sentences?

darija-open-dataset commented 1 year ago

Yeah ofc, that would be very helpful :D Contributions are always welcome and appreciated. Thanks :) For the second point, the obvious way to handle it would be to just add as many variations as possible. Otherwise, if you have any suggestions we're open to exploring them.

Thank you again Fadoua for your willingness to help improve the project!