common-voice / common-voice

Common Voice is part of Mozilla's initiative to help teach machines how real people speak.
https://commonvoice.mozilla.org/
Mozilla Public License 2.0
3.28k stars 836 forks source link

[req] stress tests to prevent possible threats (CV main website/corpus) #3372

Open robovoice1 opened 2 years ago

robovoice1 commented 2 years ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Fictional (general) scenario: A single or a group of paid trolls try to create as much not usable contribution results (cv website and/or corpus) and also binding as many resources as possible. The idea of open source is denied. Slowing down, creating confusion, or the complete stop of CV is their aim. What is the reaction of cv to contributors who are constantly abusing the system?

In actions like (trolling within CV website):

Describe the solution you'd like A clear and concise description of what you want to happen.

Monitoring mechanisms/processes for every contributor about the quality of his contributions and actions (speak/validate). Based on this monitoring removing these obviously false contributions. In the best case these useless troll contributions are reverted automatically.

A clear sign to show those troll contributors that something is going in the wrong direction could be: Block and ban by IMEI number (smartphone) Block and ban Sim card (smartphone) Block and ban by voice (if voice clips had been submitted) Block and ban by User/Session ID Block and ban user account of CV website and/or Mozilla accounts.

robovoice1 commented 2 years ago

What is the max amount of contributers for stable contributing?

The highest number of logged in users i saw was close to 900. Contributing anything was not possible.

Are campains coordinated in some way ? (campaining in different countries at the same time or worldwide?)

Is cv supporting the idea of local cv servers (by country)? Legal Terms by country?

robovoice1 commented 2 years ago

Bug #3371

robovoice1 commented 2 years ago

Also possible: Code injection via github to cv main (Already done and reversed in some linux distros)

robovoice1 commented 2 years ago

Another possible scenario: Mp3 viruses messing up the submitted recordings and/or corpus. An Example - In a Music Tracker Program (Amiga/Noisetrakker), the instruments (saved samples) were messed up by a virus. On playback you could hear loud noises/cracks, making them unusable to use/save in the music module (Mod file). You had to edit the saved sample and re-cut it. Mostly these loud cracks (data which did not belong to the sample file) were at the end of the saved sample. A second option was running external programs to check the sample files for viruses.

Are there ways to bypass clip submitting through CV website directly to corpus ???

robovoice1 commented 2 years ago

External attacks to CV main server. Ddos, bots etc.

robovoice1 commented 2 years ago

@phirework feel free to remove this (i do not want to promote a "how to" mess CV up for the public!

phirework commented 2 years ago

The highest number of logged in users i saw was close to 900. Contributing anything was not possible.

Can you clarify what you mean and when this happened?

robovoice1 commented 2 years ago

Date: 28.10.2021 - 01.11.2021 https://discourse.mozilla.org/t/cannot-login-to-mozilla-common-voice-and-cannot-download-datasets-resolved/87808 https://discourse.mozilla.org/t/issue-with-common-voice-website-loading/87821/2

Screenshot_20211202-063244 Voices online now: <900 (about 870 when i remember correctly) within the above mentioned time period. Logging into cv website was possible for me, but in speak/validate section the white box with the sentences/clips to speak or validate did not show up. Contributing was not possible, after 1 minute i got disconnected from cv site and on retry 503 server error.

phirework commented 2 years ago

FWIW that was the week that we had infrastructure upgrades that unfortunately coincided with a large marketing campaign (hence the high # of concurrent users), not a case of a deliberate attack. We've since upgraded our db to a larger size to handle more concurrent traffic, and from what I've been able to see the latency has dropped significantly even with campaigns running.

robovoice1 commented 2 years ago

req #3385

robovoice1 commented 2 years ago

req #3393 Phishing or getting passwords with "unofficial apps" or fake websites would be harder with an additional (hardware key) security layer!