Closed cconstab closed 1 year ago
The initial notifications fail to decrypt because the when notifying with value, the notification reaches first to the receiver's atsign, but the encrypted shared_key (which is needed to decrypt the value on receiver end) gets sync'ed to the sender cloud secondary (from the sender's client) at a later point of time. Hence receiver fails to find the shared_key which leads to pad block errors.
Attaching the client-server logs.(Sender - sitaram and Receiver - murali) pad-block-issue-logs.zip
@cconstab as a partial mitigation workaround we could have the sender start (before sync) by sharing something (anything) with the other atSign, then waiting for sync.
@kalluriramkumar what are your thoughts so far re a fix? We can discuss in tomorrows early sync
Also - we should be able to do better than just "pad block" - can we improve the exception handling here in terms of intent and chaining and explanation? I'd like to see something like, "Failed to decrypt value shared by atSign because we have not yet received the encryption shared key from them"
As a side note... I see this odd cached key in the public space which I do not think it should ever be public.
cconstab@cally:~$ openssl s_client -tls1_2 -ign_eof -quiet e32bbf28-4350-53eb-a7b2-0a08829d485f.swarm0002.atsign.zone:3644
depth=2 C = US, O = Internet Security Research Group, CN = ISRG Root X1
verify return:1
depth=1 C = US, O = Let's Encrypt, CN = R3
verify return:1
depth=0 CN = e32bbf28-4350-53eb-a7b2-0a08829d485f.swarm0002.atsign.zone
verify return:1
@scan
data:["cached:publickey@iot_device02","cached:publickey@iot_manager","publickey","publickey@iot_device01","signing_publickey@iot_device01"]
@
error:AT0003-Invalid syntax : invalid command
@read:errno=0
cconstab@cally:~$
I wondered if the key is actually in place but not being found as it is here ?
Also - we should be able to do better than just "pad block" - can we improve the exception handling here in terms of intent and chaining and explanation? I'd like to see something like, "Failed to decrypt value shared by atSign because we have not yet received the encryption shared key from them"
Sure Gary. will implement this.
Also - we should be able to do better than just "pad block" - can we improve the exception handling here in terms of intent and chaining and explanation? I'd like to see something like, "Failed to decrypt value shared by atSign because we have not yet received the encryption shared key from them"
This is implemented in the following PR-621. This specific changes are in encryption_util.dart file and notification_service_impl.dart-> _internalNotificationCallback in the PR.
The changes related to exception chaining are completed and need to merge it trunk branch. Futher @gkc proposed to write the shared_key directly to remote secondary instead of sync to expedite the sync'ing of shared key. Hence carry forwarding the issue to PR-43
Any update on this issue ?
Any update on this issue ?
@cconstab : The changes related to the exception chaining are merged and published. To expedite the sync'ing of the shared_key, we have approach ready. But it needs a minor change in sync_conflict_resolution code. Hence wanted to check with @gkc before pushing the changes.
@cconstab : But it needs a minor change in sync_conflict_resolution code. Hence wanted to check with @gkc before pushing the changes.
@kalluriramkumar What do I need to review?
@gkc: When sync'ing the shared key to remote secondary directly, the key gets sync'ed back to local secondary (because server is ahead of the local). At this point, failure occurs in sync conflict resolution because it tried to decrypt the data (we do not encrypt the shared_key).
As part of fixing other bug @murali-shris, added logic in sync conflict resolution to check if isEncrypted is set to true before decrypting the data should solve my issue as well. I will add the changes back and run tests. If i still see the issue, will bring up the discussion. At the moment i good to proceed further.
We (@gkc @VJag @murali-shris and @kalluriramkumar) discussed this on Thursday Oct 13, @Vjag will review where we are and how we will fix this properly. @VJag when you are ready, please can you add (into this ticket) your assessment of where we are and recommendations for what we need to do
@gkc and @VJag had follow-up discussion today.
In summary, we need to break down this important lifecycle (delivery of shared encryption key to recipient) into several parts
Steps 1 and 2 can be encapsulated into a client-side 'getSharedEncryptionKey' method, which must have a mutex to prevent race conditions, and which must throw appropriate exceptions if (1) there is no existing key which has been shared and (2) the public key of the recipient is not available so cannot encrypt a new shared key, or the key has not yet been delivered to the sender's cloud secondary (and possible collision detection managed)
As long as encrypting code, whether sending notifications or sharing data, always uses this getSharedEncryptionKey method, then we can be assured that the recipient will always have the symmetric key available to them when they need it, as notification delivery sequence will ensure that the auto-notify of the symmetric key creation is delivered before any other notifications. This will enable us to remove the current workaround which involves including the full shared key in the metadata of data shared via updates, and as a result payloads will see a large decrease in size
The collision management aspect of this is what requires a sender to not use a shared encryption key until it has been delivered to the sender's cloud secondary.
We discussed other aspects as well, which we will return to in the future:
Burned 3SP in PR47. Moving to PR48. Estimate 8SP for implementation including extensive testing to ensure we have a rock-solid solution.
@VJag if you think this is 13SP please update the label
Notes from the meeting that I and Gary had on the potential solutions:
Sol 1: Make the shared encryption key part of the payload Conclusion: It was concluded that making the shared encryption key part of the payload is not the best way to solve the problem as it increases the payload size. When the key is cached and accessible to the recipient sending it repeatedly as part of the payload can be avoided.
Sol 2: Shared key generation process in the client (SDK) should only return a value when it synced to the cloud secondary Conclusion: This solution ensures that the shared encryption is definitely available to the recipient as the one who is sharing is ensuring that the key reaches the cloud secondary at the minimum.
However, there can be a timing issue when a notification sent reaches the recipient and then the client goes offline. At that point, though the shared key is available in the sender or the recipient's cloud server as long as it is not in the recipient's local secondary it will still result in a decryption issue.
A general issue with shared key generation is two clients for an atSign generating it in parallel. At this point, one of them will override the other after a sync. We were discussing ways to perform conflict resolution in the server for this scenario.
8SP is what we can spend in this sprint. The time needed on this ticket is, to find the solution and implement it.
Another solution approach I have in my mind is:
"A sender continues to send the shared AES encryption key as part of the payload till the time he sees a notification/cached key from the recipient to indicate that the shared encryption key is available to the client"
Let's say this confirmation key is "cached:@gary:share_key_accessible@jagan". So @jagan will make the shared key part of the payload as long as he does not see this key. When the recipient sees that the shared key is synced to the client and is available for decryption, a confirmation in the form of a cached key is sent. Now the recipient on seeing this will stop sending the key in the payload.
Continuing to send key in payload fixes one problem but does not address the issue regarding collision when more than one sender client is in play, and both of them create a symmetric encryption key (and we assume there is only one in the data store)
Yes, Gary. Though it is not elegant, it does solve the two-client problem. The explanation would be when the key is part of the payload, the one from the payload should be used for decryption. So it does not matter if the two clients would have generated two keys as the keys are still decryptable. However, we still have a problem where the recipient might get two cached shared keys at two different points in time (The single shared key). I will try to explain this better in the arch call tonight.
True. But we need to agree that either (1) we will have a single shared encryption key per atsign pair (current intended behaviour) or (2) we can have multiple shared encryption keys per atsign pair
(2) is where I believe we need to go anyway. If we stick with (1) then we need to make it right
I think we should just stick with one aes key as it's tough to manage 1 let alone X.
So we need some where to offer a mutex for aes creation.. The atServer is well placed I think for that.. More discussion to be
A diagram that shows a way to not have atClients create a new shared AES key with a new atSign for tomorrows arch call or for you to improve on over night..
@cconstab once we've solved this for one AES key, the same pattern will work for many. We can start by making this work for what we have now - one. In future we will need to support at least two for when it becomes time to re-encrypt and re-share as we move to a higher key length or different algorithm or whatever.
We can use the server as mutex indeed that's by far the simplest way to solve the "simultaneous creation of shared aes key by multiple clients" problem. It is worth mentioning though that enabling multiple keys to be used will render this a non issue as each client can cut its own & use a guid as its ID
However : I can't see your diagram on my phone right now, but I don't see how we can cut the key on the server in a way that keeps the key invisible to the server??
We do not cut the key on the server instead we use it to manage a mutex.
Oh ok I thought you were proposing something new. (Can't read diagram on my phone!) Yes that's what Jagan & I were discussing; we have two ways of doing it : (1) server code to handle this specific case (2) introduce concept of variable mutability e.g. in this case we would want write once, no subsequent delete or change (but ttl would hold, for auto deletion)
sequenceDiagram
participant @alicePhone
participant @aliceTablet
participant @aliceSecondary
participant @bobSecondary
participant @bobPhone
@alicePhone ->> @aliceSecondary: lookup:publicKey@bob
@aliceTablet ->> @aliceSecondary: lookup:publicKey@bob
Note left of @alicePhone: AESkey Generated if not available locally<br/> @bobRSApublicKey<br/>used to encrypt<br/>AESkey
Note left of @aliceTablet: AESkey Generated if not available locally<br/> @bobRSApublicKey<br/>used to encrypt<br/>AESkey
@alicePhone ->> @aliceSecondary: @bob :Encrypted atKey textphone@alice "Hello Bob, how are you?" + Encrypted AESkey;
@aliceTablet ->> @aliceSecondary: @bob :Encrypted atKey texttablet@alice "Hello Bob, how are you?" + Encrypted AESkey;
@aliceSecondary ->> @bobSecondary: notify: atKey textphone@alice "Hello Bob, how are you?" + Encrypted AESkey;
@aliceSecondary ->> @bobSecondary: notify: atKey texttablet@alice "Hello Bob, how are you?" + Encrypted AESkey;
@bobSecondary ->> @bobPhone: notify:message atKey textphone@alice "Hello Bob, how are you?" + Encrypted AESkey;
@bobSecondary ->> @bobPhone: notify:message atKey texttablet@alice "Hello Bob, how are you?" + Encrypted AESkey;
Note right of @bobPhone: Inconsitent AES Encryption key for both atKeys coming from @alice
note left of @alicePhone: What is needed is a check and sync if AES key has already been generated
@alicePhone ->> @aliceSecondary: lookup:publicKey@bob
@aliceTablet ->> @aliceSecondary: lookup:publicKey@bob
Note left of @alicePhone: If AES key is not available locally or on secondary mutext set on secondary and created, then updated to secondary <br/> @bobRSApublicKey<br/>used to encrypt<br/>AESkey
Note left of @aliceTablet: If AES key is not available locally or on secondary and mutext cannot be set on secondary then recheck until AES is available <br/> @bobRSApublicKey<br/>used to encrypt<br/>AESkey
Note left of @aliceSecondary: If Mutext is set by client and the AESKey not set after 5 seconds Mutext is released.
@alicePhone ->> @aliceSecondary: @bob :Encrypted atKey textphone@alice "Hello Bob, how are you?" + Encrypted AESkey;
@aliceTablet ->> @aliceSecondary: @bob :Encrypted atKey texttablet@alice "Hello Bob, how are you?" + Encrypted AESkey;
@aliceSecondary ->> @bobSecondary: notify: atKey textphone@alice "Hello Bob, how are you?" + Encrypted AESkey;
@aliceSecondary ->> @bobSecondary: notify: atKey texttablet@alice "Hello Bob, how are you?" + Encrypted AESkey;
@bobSecondary ->> @bobPhone: notify:message atKey textphone@alice "Hello Bob, how are you?" + Encrypted AESkey;
@bobSecondary ->> @bobPhone: notify:message atKey texttablet@alice "Hello Bob, how are you?" + Encrypted AESkey;
Note right of @bobPhone: Consistent AES Encryption key for both atKeys coming from @alice
Colin's diagram is embedded as a mark down. Its the same diagram.
@gkc: Had a discussion with @VJag and the following is the first step in implementation - Ensure the notifications are not triggered unless the encrypted shared is synced to the respective atsign cloud secondary. Can you please confirm if this is fine?
Sounds good, yes
Can we try to finish this off in PR50?
Clearing the estimate; we should re-estimate what it's going to take to complete this. @kalluriramkumar can you lay out all of the major tasks required to complete this work, please?
Clearing the estimate; we should re-estimate what it's going to take to complete this. @kalluriramkumar can you lay out all of the major tasks required to complete this work, please?
@gkc: Based on my discussion with Jagan the contract on the client should be "SDK should prevent a key or a notification to be shared with another atSign till the time the SDK is sure that the shared key is available to the recipient"
For this to be implemented:
On a high level, refactor getSharedKey method follows:
@gkc: Below is the flow chart of the proposed solution:
flowchart TD
Y(((Start)))
Y --> A
A[Put a key]
B[Notify a key with value]
Y --> B
A --> C[Look for encryptedSharedKey]
B --> C
C --> D{is encryptedSharedKey found}
D --> |Found| E{Is encryptedSharedKey synced}
E --> |Synced| K[Return encryptedSharedKey] --> Z
E --> |Not Synced| L[Run Update verb to sync encryptedSharedKey on cloud]
D --> |Not found| F[Generate AES Key]
F --> G{look for cached encryptionPublicKey}
G --> |Found| H[Encrypt the shared Key]
G --> |Not found| I[Run plookup verb to fetch encryptionPublicKey]
I --> |Found| H
I --> |Not found| J[Throw AtPublicKeyNotFoundException] --> Z(((end)))
H --> L
L --> M{is sync successful}
M --> |Yes| K
M --> |No| N[Rethrow exception received when running update verb] --> Z
@kalluriramkumar Can you please link the relevant branch(es) and/or PR(s) to this ticket, and can you also please provide a brief summary of what's left to do?
@kalluriramkumar Can you please link the relevant branch(es) and/or PR(s) to this ticket, and can you also please provide a brief summary of what's left to do?
Completed changes in at_client SDK and pushed the changes to the following branch : initial_pad_block_issue
Pending work: In the persistence secondary, at the moment, we do not have a way to get the latest commit entry for a given key to check if the key is synced to the cloud secondary. Also, the same at_commit_log is used between the at_client SDK and at_secondary server which leads to bugs because a fix to the client SDK bug might have a side effect on the at_secondary server and vice versa. So, to fix the above problems following refactorings are planned:
Completed the code refactorings, dev testing, PR review and merge to trunk are pending.
Thanks @kalluriramkumar. I don't think we need to do estimation poker for this, can you add your own SP estimate here for the remaining effort please?
Following are the draft PR that fixes the issue: https://github.com/atsign-foundation/at_client_sdk/pull/823
The changes are merged to trunk branch
The changes are published in at_client - v3.0.49
confirmed working thank you
Describe the bug When sending the first notification when using atTalk the first messages fail due to pad block failures.
To Reproduce Steps to reproduce the behavior:
Expected behavior The first and any other notifications should be encrypted correctly
Screenshots pair of atSigns talking to each other
and the second
Note the encrypted message is sent
ksM2BBOfcd+Y5SPyiIen+Q==
but the receiver does not know how to decrypt is my assumption ?Were you using an @application when the bug was found? -atTalk
Additional context Add any other context about the problem here.