Tatoeba / tatoeba2

Tatoeba is a platform whose purpose is to create a collaborative and open dataset of sentences and their translations.
https://tatoeba.org
GNU Affero General Public License v3.0
704 stars 132 forks source link

In advanced search, offer the option to limit searches to sentences owned by "native speakers" that only claim one native language. #1611

Open ckjpn opened 6 years ago

ckjpn commented 6 years ago

It would be nice if this were possible, since the more languages a person claims as native languages, the less likely that you can trust his/her sentences.

Here is one example, but there are others. https://tatoeba.org/eng/user/profile/tommy_ashiq

PaulPeer commented 6 years ago

"native speakers" that only claim one native language.

Good idea, but "one native language" is a bit strict. I know a few people who have two (for instance a guy with a Dutch mother and a French father, raised in 2 languages) and precisely they become great translators or interpreters.

An alternative could be that admins get a warning message when a member enters 3 or more "native languages" so that they can take action.

Here is one example, but there are others.

That is of course fake. Probably he just misunderstood. Did you write to him?

ckjpn commented 6 years ago

While I think it's even possible for someone to legitimately claim 3 native languages with 2 parents speaking 2 languages and growing up in a country with a third language and being educated in that language, I think I'd like to option to choose a limit of one.

Perhaps it would be equally easy to offer users a choice to match their comfort level. 1, 2, 3 or more.
I don't know what kind of programming that would require.

jiru commented 6 years ago

@ckjpn If I understand correctly, you’re trying to solve the problem of trusting whether a member is really native or not. One way of doing this could be indeed to filter out people claiming more that one native language. However, as PaulPeer pointed out, you can easily end up filtering out some of the best contributors, so I don’t think it’s a good way to solve the problem. For now, isn’t it still possible to mark all the sentences of a dubious user as red? This way, they won’t show up in the search by default.

As for cases like tommy_ashiq’s, it would be interesting to see if that member did this on purpose or not. Maybe it’s just a mistake and there is room to improve the usability of the language level list.