Closed JRMeyer closed 2 years ago
In general, according to the Wikipedia page:
In standard written Hausa, tone is not marked. In recent linguistic and pedagogical materials, tone is marked by means of diacritics.
Does the data you have mark those?
There is ʼ
(U+02BC MODIFIER LETTER APOSTROPHE
) in the alphabet, which is probably equivalent to ’
(U+2019 RIGHT SINGLE QUOTATION MARK
). I added the normalisation in 2ed8882.
yes, those two a
s are quite common in the bible text
I added them in 38b92b5. It seems those are non-standard diacritic marks, but we can update this if we get a different dataset.
these look like valid Hausa characters, but
covo validate ha
will either remove them or fail on them’
ā
ă