Tatoeba / tatoeba2

Tatoeba is a platform whose purpose is to create a collaborative and open dataset of sentences and their translations.
https://tatoeba.org
GNU Affero General Public License v3.0
713 stars 132 forks source link

Konkani (knn) #2054

Closed sabretou closed 4 years ago

sabretou commented 4 years ago

[for tatoeba.org] CALL add_new_language('knn', 9577);

jiru commented 4 years ago

Note that we may have a problem with the name "Konkani", because it can refer to both kak and knn.

The SIL classifies Konkani (kak) as a macro language that includes Goan Konkani (gom) and Konkani (knn). About Konani (knn), the English Wikipedia says:

Maharashtrian Konkani more commonly spelt as Maharashtri Kokani is a group of dialects spoken in the Konkan region. It is often mistakenly extended to cover Goan Konkani, which is a member of a distinct and different set of dialects, because speakers of both refer to their language as simply "Konkani". -- gillux

sabretou commented 4 years ago

If I am not mistaken, we no longer accept macrolanguages for inclusion on Tatoeba, so kak is not an issue.

'gom' can be added as Goan Konkani when required, while 'knn' will be Konkani, as it is specified on SIL's website: https://iso639-3.sil.org/code/knn and on Ethnologue: https://www.ethnologue.com/language/KNN

jiru commented 4 years ago

Yes, we no longer accept macro languages. But users don't know about macro languages, ISO codes and SIL definitions; they only understand the language name we display.

The problem I'm pointing out is that if we add knn and name it Konkani, we may have people starting to add both knn and gom sentences inside that new corpus because "Konkani" alone is an ambiguous name, according to the quote from Wikipedia. This would be a problem since knn and gom are not mutually intelligible. And if later we add gom, it will be a mess to sort out gom sentences from knn.

On January 4, 2020 2:28:45 AM UTC, sabretou notifications@github.com wrote:

If I am not mistaken, we no longer accept macrolanguages for inclusion on Tatoeba, so kak is not an issue.

'gom' can be added as Goan Konkani when required, while 'knn' will be Konkani, as it is specified on SIL's website: https://iso639-3.sil.org/code/knn and on Ethnologue: https://www.ethnologue.com/language/KNN

sabretou commented 4 years ago

Ah yes, that's a good point that I did not consider.

Wikipedia labels knn as Maharashtri Konkani, but I'm not seeing that name in most other linguistic literature.

Still, I think we can use the same solution as we did for Punjabi. We could list the languages as Konkani (Maharashtrian) and Konkani (Goan).

What I am also concerned about is that we may need to create a transliteration engine for one or both of the Konkani languages. Although they are officially written in devanagari, they also have a very robust usage of Roman script, including a specific alphabet.

I will have to discuss both of these issues with the requester. I will see if I can have them post here on Github.

thak123 commented 4 years ago

Hi Guys,

I am the requester of the langauge and indeed (Maharashtrian) and Konkani (Goan) are two variants and hence it would be wise to split them.

I am Goan so I would be comfortable for me to contribute to Goan Konkani. We also have two variant Romi and Devanagiri

This does the transliteration for Romi to Devanagiri and vice versa http://konkanverter.com/konkanverter-ver-2-0/

sabretou commented 4 years ago

Hello @thak123 , and thank you so much for contributing to this discussion.

As you will be contributing in 'gom' and not 'knn', I think we can close this issue and create a new one for 'gom' instead.

thak123 commented 4 years ago

Kindly do the needful. Thanks

On Sat 4 Jan, 2020, 1:24 PM sabretou, notifications@github.com wrote:

Hello @thak123 https://github.com/thak123 , and thank you so much for contributing to this discussion.

As you will be contributing in 'gom' and not 'knn', I think we can close this issue and create a new one for 'gom' instead.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Tatoeba/tatoeba2/issues/2054?email_source=notifications&email_token=AA5WFEZD7ALEXQSEW6JTIVTQ4B5ZVA5CNFSM4KCTORAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEICW5OQ#issuecomment-570781370, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA5WFEYLPL5R6L567QGJSILQ4B5ZVANCNFSM4KCTORAA .