When the CLI sends a command to all enclaves, it opens a socket with
each enclave process, sends the command to each open socket and waits
for a confirmation message from each enclave, marking that the enclave
is still alive and the command has been recieved. Afterwands, the CLI
starts consuming what's available in each socket's buffer
(namely, command's reply from each enclave).
The CLI waits for the confirmations using an epoll that monitors each
socket. Currently, after the confirmation has been recieved, the epoll
still waits for messages from the "confirmed" enclave, which can lead
to the circumstance in which the epoll is triggered by a socket event
that contains command's reply (from the "confirmed" enclave), and not
another confirmation message from another process. Since we're
expecting exactly "n" confirmation messages, this leaves the operation
in an invalid state. Deleting the socket fd after the first received
message (which should be a confirmation) fixes this issue.
Alongside this fix, I have also removed an unnecessary clone of the
aforesaid socket
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Signed-off-by: Cosmin-Andrei Pletosu pletosuc@amazon.com
Issue #, if available: None
Description of changes:
When the CLI sends a command to all enclaves, it opens a socket with each enclave process, sends the command to each open socket and waits for a confirmation message from each enclave, marking that the enclave is still alive and the command has been recieved. Afterwands, the CLI starts consuming what's available in each socket's buffer (namely, command's reply from each enclave).
The CLI waits for the confirmations using an epoll that monitors each socket. Currently, after the confirmation has been recieved, the epoll still waits for messages from the "confirmed" enclave, which can lead to the circumstance in which the epoll is triggered by a socket event that contains command's reply (from the "confirmed" enclave), and not another confirmation message from another process. Since we're expecting exactly "n" confirmation messages, this leaves the operation in an invalid state. Deleting the socket fd after the first received message (which should be a confirmation) fixes this issue.
Alongside this fix, I have also removed an unnecessary clone of the aforesaid socket
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.