Tatoeba / tatoeba2

Tatoeba is a platform whose purpose is to create a collaborative and open dataset of sentences and their translations.
https://tatoeba.org
GNU Affero General Public License v3.0
679 stars 131 forks source link

Disambiguate the concept of "Armenian". #3079

Closed vxern closed 9 months ago

vxern commented 10 months ago

This issue would come in two parts:

Armenian is a pluricentric language comprising two major dialects: Eastern and Western.

The Armenian seen on Tatoeba is the Eastern variant, spoken mainly on the territory of Armenia, which means it could safely be renamed from "Armenian" to "Eastern Armenian"/"Armenian (Eastern)".

Wikipedia sources: https://en.wikipedia.org/wiki/Eastern_Armenian https://en.wikipedia.org/wiki/Western_Armenian

DJ-Saidez commented 10 months ago

Is all Armenian sentences on Tatoeba of the Eastern dialect, or a majority of them? Because if there also are Western dialect sentences, we'd need to know how to resort them.

vxern commented 9 months ago

Is all Armenian sentences on Tatoeba of the Eastern dialect, or a majority of them? Because if there also are Western dialect sentences, we'd need to know how to resort them.

The Western dialect is written using the classical orthography, meaning one could accurately check for which sentences are written in Western, and which ones are in Eastern, by searching for certain letter combinations, for example «իւ» or «էօ». Having searched for some of these, I only found two mistaken usages, which were in Eastern. I flagged those up for the author to change.

Having simply gone over some of them, on good guess, I would assume that the rest can safely be considered as Eastern, and if any Western sentences are found, I'd assume it would be easy to track the other ones down by their author.