Open trang opened 7 years ago
On 2017/08/13, Trang wrote:
Problem
Lately there has been a large number of sentences added seemingly by some bot.
https://tatoeba.org/eng/sentences/of_user/VITAE https://tatoeba.org/eng/sentences/of_user/Strategos https://tatoeba.org/eng/sentences/of_user/Alva
Here's a sample of the kind of sentences added:
Il désire gamberger. Miou-Miou s'échappe. Le cuisinier jongle. Des scouts pleurent. Vous alliez partout. L'architecte dérive. Des fakirs mèneront. J'aplatis ces mises. Des bébés embraient. La voyageuse stagne. Un groupe se coiffe. Nous allons à Kyoto. Des fumeurs aboient. Le traitre guerroie.
A large part of these sentences don't make much sense. While they aren't all incorrect, they are overall not bringing high value to the corpus.
The timestamps of the sentences though suggest they used some kind of automation to add the them indeed.
Honestly I don't think this particular example is harmful to Tatoeba. The sentences seem relatively correct.
Possible solution
Just like we've put a limit for the amount of private messages that new users can send per day, we could put a limit on how many sentences a new contributor can add per day. This would at least give more time for admins to react and avoid thousands of nonsensical sentences being added.
I am not against what you suggest, but I think it would be good to detail a bit more what we want for an implementation. Going to board a plane atm, I'll try and have a look at what we currently do for messages.
-- Maxime “pep” Buquet
It might also be a good idea to think of a level system where you have to contribute a couple sentences and get those reviewed by another member before being allowed to continue.
On 2017/10/19, Fabian Becker wrote:
It might also be a good idea to think of a level system where you have to contribute a couple sentences and get those reviewed by another member before being allowed to continue.
Yep that would be a good idea. I'd be quite in favor of that, we'd have to find a way to get sensible defaults though to not discouraged the legit user that arrives for the first time.
That could also go into the gamification discussion that appeared at some point (at least on the channel). I'm not sure there is any issue about this yet.
-- Maxime “pep” Buquet
Just for the record, there's been a recent report of bots: https://tatoeba.org/eng/wall/show_message/36111#message_36111
Just in case this helps, here are the number of sentences contributed in certain languages by usernames last week that DO NOT have linked sentences. This may help you identify some of the problem usernames. Note that this is only one week's data. and limited to only those with over 100 sentences in the given language that don't have links.
kab : 9522 :https://tatoeba.org/eng/user/profile/Iflis_Illel kab : 8017 :https://tatoeba.org/eng/user/profile/imalaqvayli kab : 4095 :https://tatoeba.org/eng/user/profile/Selyan kab : 1705 :https://tatoeba.org/eng/user/profile/Igider eng : 1000 :https://tatoeba.org/eng/user/profile/CK kab : 978 :https://tatoeba.org/eng/user/profile/Ubezwi1 kab : 838 :https://tatoeba.org/eng/user/profile/alemfarid kab : 802 :https://tatoeba.org/eng/user/profile/yiwenkan hun : 703 :https://tatoeba.org/eng/user/profile/Tilelli eng : 663 :https://tatoeba.org/eng/user/profile/Amastan kab : 470 :https://tatoeba.org/eng/user/profile/BenkerouHani ber : 423 :https://tatoeba.org/eng/user/profile/Tilelli hun : 392 :https://tatoeba.org/eng/user/profile/Tamazight spa : 367 :https://tatoeba.org/eng/user/profile/Javea ber : 327 :https://tatoeba.org/eng/user/profile/Tamazight eng : 299 :https://tatoeba.org/eng/user/profile/IE eng : 299 :https://tatoeba.org/eng/user/profile/DJ_Saidez ber : 218 :https://tatoeba.org/eng/user/profile/Mouloud kab : 197 :https://tatoeba.org/eng/user/profile/AmarMecheri rus : 178 :https://tatoeba.org/eng/user/profile/marafon spa : 126 :https://tatoeba.org/eng/user/profile/Tagawawt deu : 122 :https://tatoeba.org/eng/user/profile/Pfirsichbaeumchen
Problem
Lately there has been a large number of sentences added seemingly by some bot.
https://tatoeba.org/eng/sentences/of_user/VITAE https://tatoeba.org/eng/sentences/of_user/Strategos https://tatoeba.org/eng/sentences/of_user/Alva
Here's a sample of the kind of sentences added:
A large part of these sentences don't make much sense. While they aren't all incorrect, they are overall not bringing high value to the corpus.
Possible solution
Just like we've put a limit for the amount of private messages that new users can send per day, we could put a limit on how many sentences a new contributor can add per day. This would at least give more time for admins to react and avoid thousands of nonsensical sentences being added.