mark sensitive variables?

stef commented 4 months ago

Certain information created, exchanged, and processed in OPAQUE is sensitive. Specifically, all private key material and intermediate values, along with the outputs of the key exchange phase, are all secret. Implementations should not retain these values in memory when no longer needed.

would it make sense to mark all variables explicitly that are considered sensitive? such information would very well fit in the boxes for the functions where we list input/output/exceptions and the pseudo-code.

stef commented 4 months ago

red/black

kevinlewi commented 3 months ago

I don't know if there is actually a way to do this with colors, in a way that also appears in the .txt version of the draft with no markup. ccing @chris-wood in case he has any thoughts, but otherwise closing...

stef commented 3 months ago

red/black is the name of the technique to mark sensitive and "insensitive" data, it doesn't have to be with colors in the document. it might as well be just a table with all the variables that are sensitive, and need proper protection from leaking (like with mprotect) and to be sanitized (like with bzero).

stef commented 3 months ago

or as i originally mentioned we could mark these directly in the functions with the pseudo-code, just giving it an extra section in there:

sensitive variables: foo, bar, baz

kevinlewi commented 3 months ago

Oh, I see what you mean. So, this could be done, but I am a bit hesitant because I feel like it might be a bit too redundant / detract from the presentation. Note that all intermediate variables in a function are considered sensitive. So for instance if we take a look at doing this for the Store() function in Envelope Creation:

def Store(randomized_password, server_public_key, server_identity, client_identity):
  envelope_nonce = random(Nn)
  masking_key = Expand(randomized_password, "MaskingKey", Nh)
  auth_key = Expand(randomized_password, concat(envelope_nonce, "AuthKey"), Nh)
  export_key = Expand(randomized_password, concat(envelope_nonce, "ExportKey"), Nh)
  seed = Expand(randomized_password, concat(envelope_nonce, "PrivateKey"), Nseed)
  (_, client_public_key) = DeriveDiffieHellmanKeyPair(seed)

  cleartext_credentials =
    CreateCleartextCredentials(server_public_key, client_public_key,
                               server_identity, client_identity)
  auth_tag = MAC(auth_key, concat(envelope_nonce, cleartext_credentials))

  Create Envelope envelope with (envelope_nonce, auth_tag)
  return (envelope, client_public_key, masking_key, export_key)

pretty much every variable there that is not an input or output would be enumerated as "sensitive" (e.g. auth_key, cleartext_credentials), and arguably auth_tag, envelope as well since they are intermediate even though they are also included as part of the output.

And for a function like AuthServerRespond:

def AuthServerRespond(cleartext_credentials, server_private_key, client_public_key, ke1, credential_response):
  server_nonce = random(Nn)
  server_keyshare_seed = random(Nseed)
  (server_private_keyshare, server_public_keyshare) = DeriveDiffieHellmanKeyPair(server_keyshare_seed)
  preamble = Preamble(cleartext_credentials.client_identity,
                      ke1,
                      cleartext_credentials.server_identity,
                      credential_response,
                      server_nonce,
                      server_public_keyshare)

  dh1 = DiffieHellman(server_private_keyshare, ke1.auth_request.client_public_keyshare)
  dh2 = DiffieHellman(server_private_key, ke1.auth_request.client_public_keyshare)
  dh3 = DiffieHellman(server_private_keyshare, client_public_key)
  ikm = concat(dh1, dh2, dh3)

  Km2, Km3, session_key = DeriveKeys(ikm, preamble)
  server_mac = MAC(Km2, Hash(preamble))

  state.expected_client_mac = MAC(Km3, Hash(concat(preamble, server_mac)))
  state.session_key = session_key
  Create AuthResponse auth_response with (server_nonce, server_public_keyshare, server_mac)
  return auth_response

The intermediate variables are server_keyshare_seed, server_private_keyshare, preamble, dh1, dh2, dh3, ikm, Km2, Km3, session_key, etc.

It seems a bit too cumbersome, no? I kind of prefer just the overall statement we currently have, and leave it to implementors to make sure that the implemented functions follow good practices for writing secure code, such as zeroing out memory after the variables drop out of scope.

stef commented 3 months ago

just because something is intermediate variable doesn't mean it's sensitive (red).

pretty much every variable there that is not an input or output would be enumerated as "sensitive" (e.g. auth_key, cleartext_credentials), and arguably auth_tag, envelope as well since they are intermediate even though they are also included as part of the output.

actually i think in that function cleartext_credentials and auth_tag (maybe envelope but that is not clear to me and needs effort to figure out) are the only black variables, everything else is red in Store() and exactly that effort to figuring out if envelope is red or black could be spared.

in the 2nd example AuthServerRespond() i would naively argue that preamble for example is not sensitive (black) (unless identities are to be hidden from 3rd parties), also server_mac seems to be black (but there might be some weird authentication attack where the server_mac can be used maliciously?)

so as you can see even i am not entirely sure which of those few is actually sensitive or not without looking deeper.

i was thinking, like in papers, we usually mark secret values with lower-case letters and public values with upper-case letters, that would be a non-intrusive way to signal sensitivity of intermediary variables. what do you think? i would volunteer to make a proposal PR for such a change...

kevinlewi commented 3 months ago

The original guidance that we have in the draft, from the text you quoted, states that every intermediate variable is indeed sensitive:

Certain information created, exchanged, and processed in OPAQUE is sensitive. Specifically, all private key material and intermediate values, along with the outputs of the key exchange phase, are all secret. Implementations should not retain these values in memory when no longer needed.

And I agree, it is difficult to examine a specific variable and determine whether or not it would be a security issue if it were not erased from memory. So instead we go with the recommendation to treat everything as sensitive, and I think this is probably the best approach, even if it is being a bit aggressive in some cases. So I would opt to still not change the draft text as it stands, since the original guidance captures it well enough in my opinion.

cc: @hugokraw @chris-wood @bytemare for thoughts

chris-wood commented 3 months ago

At this point I am strongly opposed to non-essential editorial changes to the document.

stef commented 3 months ago

maybe for the future, i think this does help implementers, and also reviewers if there is a distinction between such and such intermediary variables.

hugokraw commented 3 months ago

+1

On Thu, May 30, 2024, 17:45 Christopher Wood @.***> wrote:

At this point I am strongly opposed to non-essential editorial changes to the document.

— Reply to this email directly, view it on GitHub https://github.com/cfrg/draft-irtf-cfrg-opaque/issues/453#issuecomment-2140910525, or unsubscribe https://github.com/notifications/unsubscribe-auth/AICFFXU5D7AQW6OZQZ7PFMLZE6MX3AVCNFSM6AAAAABHBIAB76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBQHEYTANJSGU . You are receiving this because you were mentioned.Message ID: @.***>

cfrg / draft-irtf-cfrg-opaque

mark sensitive variables? #453