Exploitation Algorithm - Githubissues

andresriancho commented 8 years ago

The timing attack algorithm should be able to discover a valid, hard-coded, API key with zero knowledge.

Some ideas:

[ ] Brute force the last N chars
[ ] In systems with many valid API keys it might be necessary to have the algorithm do something like this:
- Send 0<padding>
- Send ...<padding>
- Send f<padding>
- Compare all the timing data. If they are all equal then it means that a) there is no time side channel or b) all are valid first characters and the first character of <padding> is not valid for any.
- Send 00<padding>
- Send ...<padding>
- Send ff<padding>
- Compare all the timing data, repeat the same logic as before.
[ ] Once the first N chars are identified it should be possible to start with the char by char compare
[ ] Something proposed in this research (page 15) is to brute-force in tuples of N (three) chars. This is useful since the target application will delay three times more (in most cases), making the delay easier to measure. Once this feature is implemented the tool should be able to tell the user something like:

$ ./pico ...
...
Was able to find a timing difference using 4 bytes.
Will continue the timing attack using 4-byte blocks.

[ ] Related to the previous point (mathematical amplification), it might be a good idea to know how many chars at the time we're able to successfully measure. In order to do so, the tool might send:
- Send 0123<bad-padding> (0123 is known good chars)
- Send 01234567<bad-padding> (01234567 is known good chars)
- If we can measure the time difference, then we know that at least we can measure the time it takes to compare 4 chars
- Perform the same test with 3 known good chars (012<bad-padding>, vs 012345<bad-padding>), if successful move to 2 known good, etc.
[ ] The algorithm should have some kind of back-tracking code which triggers when all the samples for a character have the same delay.

andresriancho commented 8 years ago

Introduction

The main equation that drives this attack is as follows: c := is the character set of the target string n := is the total length of the target string

Brute Force: c^n trials (usually infeasible to perform. Sometimes you need the earth time to break the system)

Timing Attack in a perfect environment: c * n (usually infeasible also due to noise)

Realistic Timing attack: c^t * n/t * l where t << n and c^t can be generated in reasonable time l is the number of trials needed to reduce the error of noise and distinguish between valid and invalid trial

By carefully selecting the t, a timing attack can be performed. t should be big enough to make statistical difference over the variance in network delay and small enough to execute the attack in reasonable time. Statistical approaches such as the null and alternative hypotheses are some of the means to analyze the timing attack results.

Source

https://appsecusa2015.sched.org/event/3VgT/practical-timing-attacks-using-mathematical-amplification-of-time-difference-in-operator

andresriancho commented 8 years ago

In this video they do talk about similar things: first attack with a known API key, analyze which statistical analysis model fits best, then try to guess new ones.

andresriancho commented 8 years ago

Number of samples

If in doubt just take more samples

The number of samples should also be part of the exploitation algorithm. More samples are going to increase precision.

What I would do is to start with a sample count of 1000 and see if I can discover the known differences in that scenario. If I'm unable to do so, then try with 5k, 10k, 25k.

Make the max number of samples 150k (by default) and let the user change it.

andresriancho / pico

Exploitation Algorithm #28

Introduction

Source

Number of samples