Open cameel opened 6 years ago
@dybi Diagrams reperesenting the two versions of message routing in Middleman I have shown you yesterday:
With one change - we do need to keep track of messages after all. But message IDs will not be inside messages.
This one with queues will be added to the blueprint but I'm pasting both here in a comment in case I need to refer to them later.
Essentially the same as the one that's in the blueprint now but in a slightly different style.
This one looks simpler but really it's just showing less detail which is why I prefer the one with queues - even if the queues remain just a concept rather than actual implementation.
Open a TCP connection to Concent
@cameel , I guess this is a little bug. You meant MiddleMan
, didn't you? ;)
Yeah. Fixed.
Update: I have moved wire protocol description from #618 to the blueprint above.
Update:
The blueprint is now complete. There are still small issues that need to be ironed out but no more major rewrites are needed and all the issues that describe the implementation have been created.
Changes:
ClientAuthorization
message. There are custom frame types meant just for authentication.TransactionSigningRequest
.And here's a map showing dependencies between the issues :)
Signing Service
/-----------------> #632
/
#633 ----
\
#625 ---> #623 ---> #599
/
#631 ---/
Middleman
#629 --------\
\
---- #618 ---> #615
/
#630 --------/
\
\-------------> #616
Callback
#635 -----------------------> #632
#636
Update: MiddlemanError
replaced with a special ERROR
frame.
@cameel
Created using the same key that's used for signing Golem messages.
How it should looks like? Should it be done same way like in golem_messages that you pass deserialized message with keys and on function output gets serialized? Or should message should be hashed? If it should be hashed so which method should we use?
I'm not sure I understand what you mean. In both cases the message gets hashed because the signature is always computed using a hash of the content.
When you want to sign a frame you should get the whole serialized content, compute a hash and pass it to the same cryptographic functions Golem uses when it creates a signature (with the same key the service uses to sign its Golem messages).
How you organize all this into functions is a question to @rwrzesien.
Updated wire protocol description:
Updated:
@cameel, in case of ErrorFrame
, when original request_id
is not available, 0
should be used
@cameel Please change from
field name to from_address
.
Concent Signing Service
Components
Wire protocol
Signing Service, Middleman and Concent communicate by sending data over TCP connections.
TCP is a stream-oriented protocol which means that the client simply receives a stream of bytes and has to interpret it as messages on its own. We're going to build a message-oriented protocol on top of it. We need the following properties:
One important assumption that greatly simplifies things is that the protocol is not meant to connect random parties. Both parties are expected to have a key pair and to know each other's public keys.
Data frames
To get framing we'll use separators. A separator will be an arbitrary, unique and constant string of bytes. Encountering it means that the current message has just ended - even if it's incomplete - and a new one has started. The data inside the frame should be escaped so that if it happens to contain the string of bytes we use as a separator, the frame does not end prematurely.
For the ease of implementation, it's best to use a single character as a separator. Escaping with a multi-byte sequence has many corner cases and may be hard to implement correctly.
We're using separators rather than a field with the overall length of the message because otherwise a frame with malformed length could "eat" all the data following it. This would result in the other side losing track of where subsequent messages start and end. Using a separator ensures that in case of such an error only a single message is damaged and the communication can continue without restarting the connection.
Frame structure
The frame contains the following data:
Header has a constant length. Everything between the end of the header and beginning of the next separator is the payload. The receiver can't be sure that it has received the whole payload until it gets either a separator or the stream ends.
TCP also provides additional features that increase robustness: checksums, ordering, retransmissions, fragmentation, etc.
Data frames are not encrypted but they contain a signature which protects the data against tampering. Since we do not really need the communication to be secret, this saves us the need to use a more complex solution like a SSL-encrypted connection.
Payload types
ERROR
GOLEM_MESSAGE
AUTHENTICATION_CHALLENGE
AUTHENTICATION_RESPONSE
Error codes
InvalidFrame
InvalidFrameSignature
DuplicateFrame
InvalidPayload
UnexpectedMessage
TransactionRejected
which it never should.AuthenticationFailure
ConnectionLimitExceeded
MessageLost
ConnectionTimeout
Authentication protocol
The authentication is done with a single request-response exchange. The server sends a challenge - a random string of bytes of arbitrary length. The client is expected to sign it with its private key and send the signature as a response.
The protocol is very simple thanks to the fact that we do not have to deal with key exchange. We assume that the server knows the public key of the client ahead of time.
To make it even simpler, the challenge and the response are sent directly in the payload section of a protocol frame. They're not wrapped in a Golem message.
Authentication is necessary only between the Signing Service and the Middleman. Concent always connects on a separate port that's available only from inside the cluster.
Messages
The following Golem messages will be exchanged using the wire protocol described above:
TransactionSigningRequest
nonce
gasprice
startgas
to
value
data
from
SignedTransaction
nonce
gasprice
startgas
to
value
data
v
r
s
TransactionRejected
nonce
reason
TransactionRejectionReason
TransactionRejectionReason
enumInvalidTransaction
UnauthorizedAccount
Sequence of operation
Concent Signing Service
The Signing Service connects to Middleman as a client but then listens for requests coming from Concent via Middleman. The underlying protocol is TCP and data sent over that is expected to conform to the protocol description above.
The service first responds to an authentication challenge and then is ready to receive
TransactionSigningRequest
s. Each signing request ends either with a rejection or a signed transaction being sent back.Main loop
Steps above are performed in a loop. Each cycle lasts until the connection ends or drops.
The service only stops when it detects a shutdown signal from the operating system or Ctrl+C from the user.
Authentication
Immediately after establishing a connection the service starts listening for incoming messages and anything else than
AUTHENTICATION_CHALLENGE
is treated as an authentication failure. The service waits for the challenge for a limited time and a timeout is treated as a failure as well.The challenge is a random string of bytes. Its randomness guarantees that an attacker won't be able to predict it and reuse a previously intercepted message for authentication - for that reason a cryptographically secure pseudo-random number generator must be used.
In response to the challenge the service sends an
AUTHENTICATION_RESPONSE
frame containing a digital signature of the random string. The server is expected to send anERROR
frame and terminate the connection if the authentication fails.Authentication failures are treated the same way as connection failures. The service waits for a moment and tries again.
Connection handler
SignedTransaction
or respond withTransactionRejected
.These steps are performed in a loop. Any error in the handler interrupts the handler and the connection.
Message validation
If any of the following is not true, the message is considered invalid:
TransactionSigningRequest
?An invalid message results in an
ERROR
frame being sent back.If the message is valid, the service decides whether it's OK to sign it. The following criteria must be satisfied:
If any of them is not satisfied, the service responds with
TransactionRejected
.Error handling
The service should deal with failure in the following way:
ERROR
frame and start listening for the next message.Any other error should crash the service. The service should log the exception, send a crash report and exit with an error code. It can expect that it will be automatically restarted.
Middleman
Middleman is a component that routes messages between multiple Concent processes and a single Signing Service process.
There's one, long-lived TCP connection with the Signing Service and a set of short-lived TCP connections with Concent.
For a brief time, during the authentication there may be more than one connection with clients claiming to be the Signing Service but as soon as one of them authenticates successfully, the other connections are terminated.
When Concent wants to communicate with the Signing Service, it establishes a new connection with Middleman and sends a request. Middleman starts a new handler to service the connection (Request Producer). This handler keeps listening until it receives a valid message and adds the message to the Request Queue and goes back to listening. Another handler (Request Consumer) is responsible for sending queued messages to the Signing Service. It also assigns each message a unique number and adds it to the Message Tracker. When a response comes from the Signing Service, Response Producer uses the number to pair it with the corresponding request and put it in the corresponding Response Queue. Each Response Queue is serviced by a Response Consumer which sends the response over the connection the request originally came from.
Initialization
When Middleman starts, it runs two TCP servers on separate ports:
Queues and Message Tracker are initially empty.
Middleman listens on both ports for incoming connections and when one is established, runs the corresponding connection handler.
When the Signing Service establishes a new connection with Middleman, the TCP server passes control first to the authentication handler and, if autenthication is successful, to a connection handler that's now responsible for maintaining the connection. In case of connections with Concent the logic is simpler because there's no authentication.
Authentication
Initially Middleman allows multiple connections with the Signing Service as long as none of them is authenticated. For every such connection Middleman runs Signing Service authentication handler. They can all try to pass the authentication challenge but as soon as one succeeds, the other connections are terminated.
When a connection becomes authenticated, Middleman stops accepting new connections on the external port and starts the Signing Server connection handler.
Signing Service authentication handler
AUTHENTICATION_CHALLENGE
frame.AUTHENTICATION_RESPONSE
frame from the service.ERROR
frame and terminate the connection.ERROR
frame and terminate the connection. An invalid or unexpected message is interpreted as a failure.Signing Service connection handler
The connection handler receives a socket reader and a socket writer. Now it's time to start message handlers:
The handler is responsible for handling errors reported by message handlers and restarting them when they crash.
The handler keeps the connection open until the other side closes it. It only ever closes the connection on its own when the whole application is shutting down.
When the connection ends, the handler stops all the message handlers it has created. Message Tracker and queues are not cleared. Existing connections to Concent are kept open.
Concent connection handler
When Concent establishes a new connection with Middleman, the TCP server passes control to a handler that's now responsible for maintaining it. Middleman keeps track of multiple connections by assigning them unique IDs.
The handler receives a socket reader and a socket writer. It immediately starts a Response Consumer (which gets the writer) and a Request Producer (which gets the reader).
The handler is responsible for handling errors reported by its consumer and producer and restarting them if they crash.
The handler keeps the connection open until the other side closes it. It only ever closes the connection on its own when the whole application is shutting down.
After the connection ends, the handler stops its producer and consumer.
Request Producer
Request Producer receives a socket reader and waits for incoming messages. Each message is added to the Request Queue along with the ID of the connection it came over.
If the producer crashes before it manages to add the message to the queue, the message is lost.
Request Queue
The Request Queue is a synchronization mechanism used for ordering messages coming from multiple Request Producers and storing them in memory.
Every item in the queue consists of:
Request Consumer
Request Consumer receives a socket writer for the connection with the service when it starts.
The handler keeps consuming messages from Request Queue and sending them to the Signing Service.
If the Response Queue corresponding to the connection ID does not exist, message is dropped silently. Middleman won't be able to deliver the response anyway so there's no point in even sending the request.
Every message is assigned a unique Signing Service request ID and sent along with it. This ID is added to the MessageTracker along with the corresponding connection ID. This will make it possible for Response Producer to know which connection should be used to pass the response to Concent.
Message is not removed from the queue until it's successfully sent over the connection. This way the consumer can retry if it crashes and is restarted by the connection handler.
Message Tracker
Message Tracker is a mapping between messages and Concent connections they came over.
Request IDs are used as keys and must be unique. There may be multiple messages coming from the same connection.
Entries are added by Request Consumer and removed by Response Producer.
Each entry has the following information associated with it:
The order in which entries were sent and added to the tracker is preserved. The Signing Service is required to send responses in the same order. If the Middleman receives a response from the Signing Service out of order, all the requests sent before it are considered lost and the corresponding entries removed from the Message Tracker (more about it below).
Response Producer
Response Producer listens for messages coming from the Signing Service and adds them to the Respose Queues.
The message must come along with an ID. This ID is used to look it up the connection ID in the Message Tracker. If there's no corresponding entry, the message is ignored and the processing ends.
Otherwise the producer starts by discarding any preceding messages from Message Tracker. For each one, an
ERROR
frame is added as a response to the Response Queue corresponding to the connection ID.Then the response is added to the Response Queue corresponding to its connection ID. The entry is removed from the Message Tracker.
If there's no queue corresponding to a response (e.g. the connection has already been closed), the response is silently discarded.
Processing of each entry from the Message Tracker should be atomic - i.e. either both the entry is removed and a response is queued or neither queue nor tracker is modified. This is to prevent situations where Concent gets no response or two responses for a single request.
Response Queue
Each Respons Consumer has its own Response Queue. The queue stores messages that should be sent back over a particular connection with Concent.
Every item in the queue consists of:
Response Consumer
There is a separate instance of Resource Consumer for each connection with Concent. The consumer receives a socket writer for the connection and has access to a single Response Queue. It keeps consuming messages from the queue and sending the over is connection until Concent closes it.
The message is sent with request ID matching the one the original request from Concent had.
If the consumer crashes, Middleman closes the connection and all the data that was not yet sent to this particular Concent instance is lost.
SCI transaction signing callback
The callback
The callback is a piece of code that is supplied by Concent runs inside SCI. It receives an object containing an unsigned transaction and is responsible for putting a signature in the object.
It operates in the following way:
TransactionSigningRequest
.TransactionSigningRequest
.SignedTransaction
, copy the signature to the transaction object passed to the callback by SCI and return.The request that triggers the callback
The callback is going to run in the context of a request from a Golem client submiting or retrieving a protocol message. The exceptions should interrupt the request and result in a HTTP 502 response. This way all the incomplete changes get rolled back and the client hopefully tries again.
Synchronization
It's currently not possible to run two SCI operations in parallel. For that reason each operation should operate inside a critical section. This means that at any given time there will only be a single request and response pair going through Middleman. This is an obvious performance bottleneck and hopefully a better solution will be found soon.