Rewrite encrypted request/response logic and constants

alex-miller-0 commented 1 year ago

The code handling encrypting/decrypting requests/responses is a total mess. I could do a whole writeup on what we are doing wrong, but here are the highlights:

We are erroneously using a fixed constant for encrypted requests and responses (ENC_MSG_LEN). This constant is defined based on the largest response type (1646 bytes) and adds some metadata. However, the metadata is completely wrong and the result is 1728 bytes, which doesn't map to anything useful. Instead, we should be breaking out more unique constants and using them according to the Lattice spec (see writeup below)
Firmware will evaluate the decrypted data according to the encrypted route's request data size. This just so happens to be smaller than the 1728 bytes we always send, so our requests work incidentally, but an outside viewer would find this connection incredibly confusing... because it is wrong.

Requests

All messages to Lattices should look like this:

0x01 (version) | 0x02 (secure request type) | random_id (4 bytes) | payload_len (2 bytes) | payload (1701 bytes) | checksum (4 bytes)

Where the checksum is over the 1701 payload bytes.

We should only need to worry about constructing the payload , which is formatted as:

request_type (1 byte) | data (1700 bytes)

where request_type is 0x01 for connect and 0x02 for encrypted requests.

So now we just need to construct the data for our two message types (connect and encrypted).

For connect request data:

pubkey (65 bytes)

Where pubkey is the client's ECDSA keypair's public key.

For encrypted request data:

ephemeral_id (4 bytes) | data (1696 bytes)

Where ephemeral_id is the first 4 bytes of the SHA256 hash of the ECDH shared secret derived from the client's private key and the last ephemeral public key returned from the Lattice and data is a fully encrypted payload (i.e. the raw data is right-padded to 1696 bytes and then encrypted). Encryption is done using AES-CBC-256 and using the full ECDH shared secret.

For encrypted request data, we should always create a buffer using the full 1630 bytes, copy our data into it, and encrypt the entire envelope. That's it!

Responses

Response messages are of form:

0x01 (version) | 0x00 (response type) | id (4 bytes) | data_len (2 bytes) | data (<=3393 bytes) | checksum

Where id can be thrown away by the requester (SDK), data is variable sized, and checksum is over the entire preceding message (including metadata). This deserialization is done (correctly) in parseLattice1Response.

As before, all we need to worry about is deconstructing that data param.

For connect responses, data is well defined as:

response_code (1 byte) | pair_status (1 byte) | ephem_pub (65 bytes) | fw_version (4 bytes) | enc_wallet_data (144 bytes)

This is already implemented correctly in decodeConnectResponse

For encrypted responses, data is of form:

response_code (1 byte) | encrypted_data (3392 bytes)

Note that encrypted_data is twice the size it should be (1696) because of a bug in a firmware type. However, only the first 1696 will ever be non-zero, so we should just slice from 0-1696.

The encrypted data should, at this point, be decrypted using the shared ECDH secret and the individual handler should be called based on the first byte of the decrypted data, which is an index of what type of encrypted response data this is.

KevinVitale commented 1 year ago

This really doesn't seem "bad", or at least, not something I'd consider a "show stopper" worth immediately fixing.

I acknowledge this would mean incoming messages would be slimmer, but having just written the encrypt/decrypt flow for my own SDK (twice within the last 3 mos 😭), the checksums returned currently do validate on the client-side.

[Side Note: the javascript version inexplicably ignores the returned checksum for every request it gets back. Not sure why?]

Knowing the expected size of the message's payload is all that's really needed. While fixing this would stop client-side code from having to hardcode payload sizes, client-side code is already so tightly coupled to firmware changes.

It would be better to solve the issue of how to decouple the two architecturally first, before change something as delicate as the encrypt/decrypt flows. It's trivially consequential within the current reality of how firmware updates affect SDK implementations.

I noticed you labeled this issue as "invalid". Does that mean you agree?

KevinVitale commented 1 year ago

Also, id absolutely cannot be thrown away 🙃

GridPlus / gridplus-sdk

Rewrite encrypted request/response logic and constants #518

Requests

Responses