18F / identity-idp

Login.gov Core App: Identity Provider (IdP)
https://secure.login.gov/
Other
524 stars 112 forks source link

Fix flaky `RECAPTCHA_SIGN_IN` A/B test #11264

Closed jmhooper closed 1 month ago

jmhooper commented 1 month ago

The RECAPTCHA_SIGN_IN A/B test has a discriminator block that does the following:

  1. Return nil if captcha was not performed
  2. Return the user's UUID if the user is present
  3. Return a random value if no user is present with the intent of sorting them into a bucket

The implementation of AbTest#bucket checks if the discriminator resolved by this discriminator block is blank and returns nil if it is (ref).

The blank? method has a special implementation for String (ref) that returns true if the string consists of only whitespace.

The random value here was generated using SecureRandom.gen_random. This value was then used in the AbTest#bucket to randomly sort unidentified users into AB test buckets. SecureRandom.gen_random returns a string of random bytes which can include " ", "\n", "\r", etc.

The combination of the above meant that the discriminator could generate a value that caused the AbTest#bucket to return nil for RECAPTCHA_SIGN_IN when it was intended to return a bucket. This lead to a flaky test (this spec tests that the bucket is not nil; it is nil in the case where SecureRandom.gen_random returns a blank string).

This commit changes the call to SecureRandom so that it only ever generates alphanumeric strings which will never be blank.

aduth commented 1 month ago

The implementation of AbTest#bucket checks if the discriminator resolved by this discriminator block is blank and returns nil if it is (ref).

Alternatively, maybe this should check .nil? ? I don't know if we're expecting other values to follow this code path (e.g. false, empty string '')

jmhooper commented 1 month ago

I figured it was safest to change the implementation in the ReCaptcha bucket to avoid any possible side-effects. I'm not sure why we use #blank? and not #nil?, but it looks like it has been there since the beginning.

Another thought is that SecureRandom.alphanumeric(1) has a much smaller set of values it can generate. That could lead to issues where you don't see anybody sorted into buckets with small percentages. WDYT of changing that to something like SecureRandom.alphanumeric(8)?

aduth commented 1 month ago

That seems fine, yeah 👍