cds-snc / covid-alert-server

Exposure Notification: Diagnosis Server implementation / Notification d’exposition : Mise en œuvre du serveur de diagnostic
Apache License 2.0
298 stars 31 forks source link

BVC-55: Race condition in bruce force attack pattern against claim OTC key path. #121

Open maxneuvians opened 4 years ago

maxneuvians commented 4 years ago

General Information Function: claimKey() Location: covid-shield-server/pkg/server/keyclaim.go Path: /claim-key

Logic of claimKey() Check how many attempts the IP has left based on the current number of failure associated with the IP

https://github.com/cds-snc/covid-shield-server/blob/4518d8d8bca083e450a459e8a3bcac8ef5be83e0/pkg/server/keyclaim.go#L115

If the IP has 0 attempt left → return HTTP 429 Too Many Requests <Initially, a user has 8 attempts available>

https://github.com/cds-snc/covid-shield-server/blob/4518d8d8bca083e450a459e8a3bcac8ef5be83e0/pkg/server/keyclaim.go#L119

Reads the data from the HTTP request received

https://github.com/cds-snc/covid-shield-server/blob/4518d8d8bca083e450a459e8a3bcac8ef5be83e0/pkg/server/keyclaim.go#L127

Tries to claim the one time code read from the data read in previous step

https://github.com/cds-snc/covid-shield-server/blob/7d0a103f6b5ca2b66a5598a21c2ce3f9cbf9b3b7/pkg/persistence/queries.go#L65

Verify that the application public key isn't already in the encryption_keys table

Verify that the one time code is valid <In this use case, this is the last step because the one time code isn't valid>

If the one time code isn't valid

The number of failure associated with the IP is increased.

https://github.com/cds-snc/covid-shield-server/blob/b10ebdf7a7f926f24526ad1d2b485389c5536e29/pkg/persistence/queries.go#L384

Return HTTP 401 unauthorized

If the one time code is valid

https://github.com/cds-snc/covid-shield-server/blob/4518d8d8bca083e450a459e8a3bcac8ef5be83e0/pkg/server/keyclaim.go#L190

return the server public key to send keys to the server using /upload

Race Condition Description

The race condition takes advantage of the execution time in between step 1 and step 5a. The number of failure in between those step is assume to stay the constant. Since more than a thread can be executing claimKey() at the same time, this assumption leaves a possibility that the value won't be updated before another thread accesses the database to verify if the same IP can try to get a valid one time key step 2. Since there are 2 database queries before updating the number of failures, it leaves enough time for a user to send more than a request simultaneously. This would give the user more chances to guess a valid one time code.

Proof of concept Two methods were use to successfully exploit the race condition. The POC was written in python. Multi processing is used instead of multi threading in order to avoid the global interpreter lock (GIL). In other languages, a multi threaded solution should work.

First method (Most reliable) Current setup: 7 processes Do sequentially 7 requests to /claim-key <available attempt - 1> Create a number of processes Synchronize all the processes Send a request to /claim-key with each process created at step 2

Second Method (Best results) Current setup: 25 processes Create a number of processes. Synchronize all the processes Send a request to /claim-key with each process created at step 1

obrien-j commented 4 years ago

Marking this as low pri, and will discuss a proper fix for after launch, based on presence of AWS WAF rate limit rules as well.

Fixing this will require some locks that may potentially introduce other instability into the ingress flow so pausing for now.

zqureshi commented 3 years ago

Looked into this a bit, I have written down some thoughts here.

CalvinRodo commented 3 years ago
maxneuvians commented 3 years ago

Implement leaky bucket once redis is available