Open cameel opened 6 years ago
@cameel Few quick questions after first reading:
Subtask
table (ie. report_computed_task
as foreign key to StoredReportComputedTask
) is not scalable and normalized. Instead of that we may add subtask_id
column on StoredReportComputedTask
table etc (actually to their abstract).receive-out-of-band
. But I don't think that changes anything in this blueprint.
- Currently when message is immediately rejected and there is a rejection message send back to the client in same HTTP request we don't store this message in any way. Should this now be stored ?
Eventually yes. Right now - maybe.
In the code as implemented currently we're storing them but that's because the table we have was meant to be the receive queue and contain everything sent to the client (marked as delivered). As you see above, the design changed a bit - now the receive queue is not storing actual messages but the information needed to create a messages when a client asks for it. It also only covers messages returned from receive/
and not all possible messages. So we're not storing them in the receive queue any more.
We do want to store them for audit purposes (and maybe for debugging). We'll eventually want to add all sent and received messages to StoredMessage
(or a table for specific mesage type if one exists) and have another table with foreign keys to them that says who received/sent it and maybe some other metadata.
We don't have such a table right now and there's no task to do it yet (there's only #47 which is only about logging) so right now you could store them in StoredMessage
if you think this will help us with debugging when something goes wrong in production but it's not absolutely required.
- I think that adding column for a message type in
Subtask
table (ie.report_computed_task
as foreign key toStoredReportComputedTask
) is not scalable and normalized. Instead of that we may addsubtask_id
column onStoredReportComputedTask
table etc (actually to their abstract).
Each task is associated with a small and strictly defined number of messages that have a special meaning. One TaskToCompute
, one ReportComputedTask
, one or zero AckReportComputedTask
and so on. Say, for example, the requestor sends us two different AckReportComputedTask
instances. Concent obviously rejects the second one but we're still going to want to store it for audit purposes. The subtask table needs to indicate which one is the one that was accepted by Concent. That's what this column is for. It's not saying "this message is associated with this task". It says "this message is the one valid and accepted AckReportComputedTask
that will be resent to the provider and used in all further use cases".
Also note that we're not going to add a column for every message associated with the subtask. Some messages are more important than others. The columns I added there are not just examples. These are the only messages that a subtask needs to know about. So basically TaskToCompute
and ReportComputedTask
(because they're used for auth and to determine deadlines) plus the confirmations/rejections that we need to pass between the provider and requestor. That's all.
It's not a scalability problem because it's not a replacement for the subtask_id
column in stored messages (which should be be added there too). For the vast majority subtask_id
is enough. For some specific messages we have to choose one of many and that's why we have those 1:1 columns (they should be OneToOneField
of course).
And to be completely clear - I see StoredMessage
(and the other message-specific tables like StoredTaskToCompute
) as a dumb message table where the message object does not know why it's being stored. The mutated super subtask-receive-queue table that we have now needs to go. A dumb message storage makes the system more extensible because this way we can tack on various independent special-purpose tables (subtask table, receive queue, audit table) that just refer to already stored messages and give them meaning.
- "Another thing of note is that there are no states for 'forced payment' use case. This use case operates on multiple subtasks and the response is returned immediately so there's no state change here." - just a note, in full flow it returns two messages, first for provider that is returned as response for request, second is waiting for requestor in
receive-out-of-band
. But I don't think that changes anything in this blueprint.
Yeah, the blueprint skims over the payments because they don't need a separate state. I assume that payment requests can come at any moment and are processed immediately. Concent just responds synchronously and maybe adds messages to the receive queue. It does not need to do any long calculations or wait for any client to send or upload anything so there's no state associated with that.
One change is needed though. ForcePaymentCommitted
needs to be added to the list of message types supported by the receive queue. I'm adding it to the description. This list is meant to be exhaustive. The receive queue should not support messages other than those listed there.
BTW: if there are any fields that need to be added or changed in the tables please consult with me first. I think the ones listed here should be enough but if you find something that's missing please let me know before you implement it.
One more thing. Remember that subtask_id
in the message table(s) cannot be a foreign key. We can always receive a message for which a subtask does not exist and we may still want to store that message.
@cameel Ad 1) I think we are talking about two different things here. What I have meant is ie. this piece of code:
if client_message.report_computed_task.task_to_compute.compute_task_def['deadline'] < current_time:
logging.log_timeout(
client_message,
request.META['HTTP_CONCENT_CLIENT_PUBLIC_KEY'],
client_message.report_computed_task.task_to_compute.compute_task_def['deadline'],
)
reject_force_report_computed_task = message.RejectReportComputedTask(
header = MessageHeader(
type_ = message.RejectReportComputedTask.TYPE,
timestamp = client_message.timestamp,
encrypted = False,
)
)
reject_force_report_computed_task.reason = message.RejectReportComputedTask.REASON.TaskTimeLimitExceeded
reject_force_report_computed_task.task_to_compute = client_message.report_computed_task.task_to_compute
return message.concents.ForceReportComputedTaskResponse(
reject_report_computed_task = reject_force_report_computed_task
)
This happens on communication start, so there is no Subtask created yet. And we currently don't store anything when this flow happens. So the question is - should we store it now, like in this example, create Subtask object and related StoredRejectReportComputedTask object ?
@cameel In Subtask Table, You've said that report_computed_task
allow Null
value if Subtask
is in "FORCING REPORT" State.
State diagram updated.
FORCING ACCEPTANCE
state was missingADDITIONAL VERIFICATION
states. The one above should have been called VERIFICATION FILE TRANSFER
Here's the old diagram for reference:
@cameel In Subtask Table, You've said that report_computed_task allow Null value if Subtask is in "FORCING REPORT" State.
Yes. It can be NULL
in the FORCING REPORT
state. Description updated.
State diagram updated. I have added a missing state transition for the case when Concent overrules requestor's decision in the FORCING REPORT state.
@rwrzesien
Ad 1) I think we are talking about two different things here.
No, we're not. I was referring specifically to this. I originally wanted everything to be stored in the queue. Including the messages created in send/
. But at the beginning there were no synchronous responses so it was satisfied in a trivial way - there just weren't any ;) Later they should have been stored but this just was not changed because the implementation went its own way.
This happens on communication start, so there is no Subtask created yet. And we currently don't store anything when this flow happens. So the question is - should we store it now, like in this example, create Subtask object and related StoredRejectReportComputedTask object ?
As I said, the design has changed. Now we do not want to store them in the queue. Queue is only for stuff returned from receive/
. But we may want to store them for audit or debugging. That's a separate thing. Those messages whould be just inserted into the message table without anything linking to them. You may do this right now if you find it useful. If not, we'll probably do it at some point.
@cameel I think we made a mistake with report_computed_task
being Null
in FORCING_REPORT
state. Looks like it should be Null
in FORCING_ACCEPTANCE
instead. At least that's the case in golem-messages v1.12.
No, you can't force the requestor to accept the task if you have not finished it. And if you finished it you must have the report to prove it.
Actually now I see why I initially said it must not be NULL. I confused it with the ack. We don't have the ack yet in the FORCING REPORT
phase but we do have the report and it must not be NULL. I'm changing the spec back.
@cameel,
please update the Subtask
table (due to: https://github.com/golemfactory/concent/issues/329):
dedline
(int
)result_package_size
(int
, nullable) from ReportComputedTask
Updated, but with some alterations:
computation_deadline
instead of deadline
to prevent confusion with next_deadline
. Otherwise one might think that it's the "current" (as opposed to the "next") deadline which is not true. It's just a deadline for the computation phase while next_deadline
is the deadline of the current state. Actually, now that I think about it, we would be better off calling next_deadline
state_deadline
but it's too late to change it now.result_package_size
can be NULL
if and only if report_computed_task
is NULL
. Don't forget to add a validation for that in the model. In views they're not be necessary though because it's an invariant that should never be violated as long as the code is corrrect.State diagram updated after applying suggestions from #432:
VERIFICATION FILE TRANSFER
and ADDITIONAL VERIFICATION
states is now green.ForceSubtaskResultsResponse
(was SubtaskResultsResponse
).SubtaskResultsVerify
.Another update of state diagram. Only small tweaks. See https://github.com/golemfactory/concent/pull/432#issuecomment-399028222
Please add computation_deadline and result_package_size to Subtask
Update: ForceGetTaskResultRejected
is no longer a valid value in ConcentResponseType
. It's not needed and should have never been there. Bug reported by @rwrzesien in #823.
@cameel Subtask
model now has additional field protocol_version
which contains version of messages related to given subtask.
Concent states
Concent server can be thought of as a state machine with regard to any specific subtask. Upon receiving a message in the
send/
endpoint the processing depends only the current state and the content of the message submitted by the client.Here's a diagram that shows all possible Concent states and transitions between them.
Note that the states we're talking about here are Concent states, not subtask/task states. This means that the state reflects Concent's knowledge about the subtask and not necessarily what is happening to it between the provider and requestor. For example if the provider triggers the 'force report computed task' use case, requestor acknowledges it and none of them interact with Concent after that, the subtask stays in the
REPORTED
state even though it may have already been accepted or rejected.To reflect this we can classify the states into two categories:
Another thing of note is that there are no states for 'forced payment' use case. This use case operates on multiple subtasks and the response is returned immediately so there's no state change here.
Subtask table
To keep track of subtask states Concent uses database tables with the following structure:
Subtasks
task_id
requestor_public_key
,task_id
) pairs must be globally unique. Must not beNULL
or blank. A task can have multiple subtasks. Concent always operates on individual subtasks separately. The task ID is kept only for reference.subtask_id
requestor_public_key
,subtask_id
) pairs must be globally unique. Must not beNULL
or blank.requestor
Client
NULL
provider
Client
NULL
state
UNKNOWN
because it represents a state in which Concent does not know about subtask's existence.next_deadline
NULL
in a passive state.computation_deadline
task_to_compute.deadline
. Must not beNULL
.result_package_size
report_computed_task.package_size
.NULL
if and only ifreport_computed_task
isNULL
.task_to_compute
StoredTaskToCompute
NULL
. A subtask always has aTaskToCompute
associated with it.report_computed_task
StoredReportComputedTask
NULL
.ack_report_computed_task
StoredAckReportComputedTask
NULL
if not submitted or generated yet.reject_report_computed_task
StoredRejectReportComputedTask
NULL
if not submitted or generated yet.subtask_results_accepted
StoredSubtaskResultsAccepted
NULL
if not submitted or generated yet.subtask_results_rejected
StoredSubtaskResultsRejected
NULL
if not submitted or generated yet.Clients
public_key
NULL
or blank.StoredTaskToCompute
,StoredReportComputedTask
,StoredAckReportComputedTask
,StoredRejectReportComputedTask
,StoredSubtaskResultsAccepted
,StoredSubtaskResultsRejected
are all tables storing the content of the corresponding Golem messages. Each table has columns for timestamp, signature and all payload fields. Nested data is represented in a relational way (e.g. nested messages should be foreign keys to the corresponding tables). We want to be able to easily and efficiently use message data in queries. We don't want to have to fetch and decode each message individually. We also want to strictly enforce the schema so that we can easily detect changes and provide data migrations for old data.State transitions
State transitions can happen in two ways:
send/
endpoint.Automatic transitions require Concent to actively check subtask states in the background and take actions when the time is up. Note however that even if Concent notices a timeout, the client won't until it receives a response. Due to the nature of the current implementation (Concent is passive and clients query the
receive/
endpoint to get responses) it's possible to achieve the same effect (from client's standpoint) by postponing timeout checks until the client actually asks. That's the approach we're taking for now.So this means that automatic transitions can happen whenever the client contacts one of the three message-related endpoints:
send/
,receive/
orreceive-out-of-band/
. Before doing any processing in these endpoints Concent runs a function that checks timeouts for all subtasks in which the client is involved, updates the subtask table and adds responses to the receive queues.When a timeout is detected Concent may have to perform additional steps to determine the outcome. For example it may have to check if all necessary files have been uploaded. The response added to the queue can depend on the result but once it's in the queue it won't be changed or be automatically discarded.
Receive queues
Each client has two queues associated with it:
Each of them stores information about messages to be returned from the corresponding endpoint. Messages are generated from that data on the fly so that the timestamp is always current. This applies only to the outer messages generated by Concent. If there are nested messages submitted from the outside, their timestamps and signatures are guaranteed to be preserved.
Every queue is associated with a single client and can contain messages associated with any of its subtasks.
Messages are added to the queue as a result of state transitions (see above). Once queued they do not undergo any further processing. They just sit in the queue until the authorized client fetches them.
Messages are returned in the order of their timestamps (oldest first). Note that, due to the nature of the HTTP protocol, if a client fetches multiple messages in parallel the order in which they reach it is not in any way guaranteed even if the server returns them in a specific order.
Messages are never discarded from the queue, even after state transitions. It's the responsibility of the client to decide whether a message is still relevant or not.
A message is returned only once and then marked as delivered. Delivered messages may remain stored on the server for use in further use cases or for audit purposes but there's no way for the client to fetch them again.
Queue internals
The queues are implemented using a single database table with locks to ensure that the process of retrieving a message is atomic.
The structure of the table is as follows:
ReceiveQueue
response_type
ConcentResponseType
enumclient
Client
queue
receive
orreceive-out-of-band
delivered
TRUE
if the client has already fetched the message.subtask
Subtask
created_at
All the columns must not be
NULL
.ConcentResponseType
enumerationForceReportComputedTask
ForceReportComputedTaskResponse
VerdictReportComputedTask
ForceGetTaskResultFailed
ForceGetTaskResultUpload
ForgetGetTaskResultDownload
ForceSubtaskResults
SubtaskResultsSettled
ForceSubtaskResultsResponse
SubtaskResultsRejected
ForcePaymentCommitted
Note that this list contains only those messages that can be added to the receive queue, not all possible messages. Messages that are always returned synchronously from the
send/
endpoint are not included here.The type and the data stored in the
Subtask
table is everything that's needed to create a response. For example if the response is a wrapper for the actual response passed from the requestor, the link to requestor's response is stored in theSubtask
table. If the content of the response depends on the result of some external checks done after a timeout, this result should be stored inSubtasks
. TheSubtasks
table (along with the related tables) is the central place to store everything we need to create a complete response. The information should be stored in a relational form - in database columns - rather than inside serialized Golem message objects.Message storage
Eventually every Golem message handled by Concent should have a corresponding database table. In short term only the most often used messages will get their own tables and the rarely used ones will still be stored as binary blobs in the
StoredMessage
table.Message tables are meant to store all messages no matter what their purpose is. Other mechanisms like subtask table, receive queues or audit storage can build on that by referencing them with foreign keys. None of them should make an assumption that stored messages are for their exclusive use.