Research strategy and test for anti-Flood protection

vecna commented 9 years ago

Some issue report about performance doubt ( #1224 ), some issue are remain open because due to lacking of impact analysis on the anti-DoS strategies ( #825 #797 ) and the load caused in a system.

So I was thinking to perform some structured tests, collect data, and provide the defaults for the anti-DoS in a meaningful way. with these data, we can also provide meaningful information to a third party reviewer concerned about delays, captchas and tokens.

In order to do so, I've to (code):

find out a python profiler
re-enable Storm debug and check timings involved in the operation
test the proctractor and perform parallel execution
makes dump timing information in JSON format
put all the "research" options as part of a dedicated option, like Storm debug at the moment is -s
maybe recover the stress tester based in Twisted, never updated with the current API.

And research:

test login exponential delay, add token to that and compare differencies
test over submission
test over parallel submission with fields up to the limit
test with file upload in parallel

/cc @evilaliv3 @fpietrosanti

fpietrosanti commented 9 years ago

@vecna Did we ever considered "performance" as an issue of the result of a DOS? The issues has been in an excessive amount of submissions to be managed from the receivers point of view and a problem with notifications, but not a performance problem. Btw from the implementation, because of the end-to-end investment on GLC, it would be much better to make any kind of flood-testing with GLC itself improving it without writing code that will become outdated in few months because used only for flood testing.

vecna commented 9 years ago

@fpietrosanti I mean, is part of the anti-DoS understand (through profiling) why sometime DB keep seconds to perform answer. if that can be controlled or influenced from the outside, this will make all the activities performed via GL delayed (because the transact is a single, blocking, thread). So what affect some performances, has to be understand clearly also for the reliability point of view.

I agree with GLC consideration, I can just get rid of twisted stresser, but if proctractor do not permit me to control threads and parallel activities (one of the possible attack scenarios), I've to find a workaround, I don't want make a test suite overcomplicated and time expensive.

evilaliv3 commented 9 years ago

i agree that at the stage we are writing the test for parallel submission with protractor and with a possible configuration in the number of requests and the number of parallel requests would be valuable. if you want to concentrate a bit on writing the code for the Storm debugging code in the meantime i would add this test (i would need not more of a day for finishing current fixed to unit testing).

in order to be able to analyze the possible openes from performances issues to DoS issues it would be important for the possibility to log: 1) requests timing (from request to respons) [i think you already did it right @vecna?) 2) transact timings (each transact should be loggable with a duration time) [i've already an idea of how to do it, but feel free to proceed if you want] 3

vecna commented 9 years ago

@evilaliv3 yes, the accounting of execution time for http request is already in place, I was thinking in this issue to dump it on a JSON, so can be given in a d3 graph renderin ( will be easier than describe in the issue :) )

Storm transact timings, not an issue, can be done, just I'm curious to see the execution time of every single python line (that's what a profiler can do) so we can see, for example, if accessing to a reference is more expensive or the time costs is only at the store.find() routine.

3 .. ?

evilaliv3 commented 9 years ago

1, 2, 3, stella!

fpietrosanti commented 9 years ago

Following #1319 @evilaliv3 I don't think that we need to track timing of APIs are there are no performance improvements / issues being considered in the anti-dos.

I'd suggest to re-consider the overall approach to testing, because writing scripts with raw-API means writing software designated to become "abandonware" the week after it has been written.

With active-code required to make submissions (hashcash, client-side crypto, tokens, etc), i think that the only valuable approach to build a testing client, is to do it based on GLClient itself, that will provide full testing of APIs, stateful tracking of features but especially code-reuse and maintenance of the code being written.

evilaliv3 commented 9 years ago

the tracking of the API is useful in general to see that we have not written some code that do not scale at all. do you remember the preview issue (https://github.com/globaleaks/GlobaLeaks/issues/1224) ?

fpietrosanti commented 9 years ago

@evilaliv3 To detect such condition, we can just have backend to support automatic-measurement of execution time of each API, and if it goes above a certain value (ie: 500ms), then trigger an exception email notification to us. That way, without testing it directly, we always detect APIs that may have performance issues

evilaliv3 commented 9 years ago

sure, but i do not find bad to collect this values during unit testing and take out a html file that enable to see them. it's costly and it help verifying the maturity of the app.

fpietrosanti commented 9 years ago

But we do not need a general performance profiling in a simulated environment (because we have no performance issue), but Maybe only a way to catch those erroneous condition in production environment.

Btw anti-flood techniques are not targeting performance issues and should be outside the scope of testing it

evilaliv3 commented 9 years ago

i've a different idea on this

Dos can be caused by API not well designed that does not scale where a performance issues can be exploited by a malicious user that with a single request can make the system perform tons of work? yessa!
Do we track systematically issues of poin 1? NO!

=> we are not proving in any scientific way that not with a dimonstration not with an empirical way that we our system does not open Dos doors. :)

globaleaks / GlobaLeaks

Research strategy and test for anti-Flood protection #1269