Open cameel opened 6 years ago
@cameel
verification_request task Parameters: compute_task_def - ComputeTaskDef from TaskToCompute
In what format this is going to be passed/received ? I think that you can't pass unserialized objects to celery tasks.
As a dict. If we can't pass a dict, we'll have to split it into individual parameters.
This is just an implementation detail so I did not want to get too much into specifics. Basically - we want to pass ComputeTaskDef
(or at least all relevant parts) between Concent and Conductor and how exactly we achieve this is a different concern. I know it's doable.
Right, I have forgot that ComputeTaskDef
is a dict, not a message, then there is no problem.
Update: Changed the structure of database tables and Celery tasks:
VerificationRequest
has been split into two parts - one common for all requests and one meant only for blender rendering operations.TaskToCompute
to make it more generic.VerificationRequest
modelColumns:
ComputeTaskDef
that we actually use. Ignore the fields we do not need to run verification. I think we'll need the following ones:
subtask_id
src_code
extra_data
short_description
(might be useful in logs)working_directory
performance
docker_images
docker run
like Golem does because our code is itself in a container.created_at
: timestamp. Indicates when Conductor has received the request.UploadRequest
modelExistence of this object indicates that Concent is expecting a client to upload a specific file.
Columns:
verification_request
: foreign key to VerificationRequest
.
NULL
.path
: relative path of the file
FileTransferToken
s are relative to.NULL
.UploadReport
modelExistence of this object indicates that a file has been uploaded to nginx-storage
and nginx notified Conductor about this fact.
Note that may be possible for the client to upload a file even when there is no corresponding UploadRequest
. This can happen if the upload finishes before Conductor receives verification_request
from work queue or if the upload is not done in the verification use case (but for example the 'force get task result' use case).
Multiple UploadReport
instances can exist for the same file - if the client uploads it multiple times.
Columns:
path
: relative path of the file
FileTransferToken
s are relative to.NULL
.upload_request
: foreign key to UploadRequest
.
NULL
if there's no corresponding request.created_at
: timestamp. Indicates when conductor has been notified about the upload.
NULL
.verification_request
taskParameters:
compute_task_def
- ComputeTaskDef
from TaskToCompute
files
- list of files the provider is expected to uploadverification_order
taskParameters:
VerificationRequest
modelsource_file
: Relative path of the .zip file that contains Blender source files for the renderresult_file
: Relative path of the .zip file that contains the rendering result received from the provider.verification_result
taskParameters:
subtask_id
result
: VerificationResult
enumerror_message
: string
reason != ERROR
error_code
: string
reason != ERROR
VerificationResult
enumerationMATCH
MISMATCH
ERROR
Update: UploadRequest
model has been removed since now VerificationRequest
is always associated with exactly two files and has fields that contain their paths.
UploadRequest
modelExistence of this object indicates that Concent is expecting a client to upload a specific file.
Column name | Type | Remarks |
---|---|---|
verification_request |
VerificationRequest foreign key |
Can't be NULL . |
path |
string | Relative path of the file. Relative to the same directory that paths listed in FileTransferToken s are relative to. Must be unique. Concent core (which sends the request) is responsible for tracking all verification requests and must never generate duplicate paths. Can't be blank or NULL . |
UploadReport
modelExistence of this object indicates that a file has been uploaded to nginx-storage
and nginx notified Conductor about this fact.
Note that it may be possible for the client to upload a file even when there is no corresponding UploadRequest
. This can happen if the upload finishes before Conductor receives blender_verification_request
from work queue or if the upload is not done in the verification use case (but for example the 'force get task result' use case).
Multiple UploadReport
instances can exist for the same file - if the client uploads it multiple times.
Column name | Type | Remarks |
---|---|---|
path |
string | Relative path of the file. Relative to the same directory that paths listed in FileTransferToken s are relative to. Does not have to be unique. It's technically possible for the client to upload a file twice. |
upload_request |
UploadRequest foreign key |
Can be NULL if there's no corresponding request. |
created_at |
datetime | Indicates when conductor has been notified about the upload. Can't be NULL . |
Updates for #411 and #413:
upload_finished
and upload_acknowledged
tasksupload_finished
and upload_acknowledged
to the VerificationReqest
table.Update: Added package checksums and sizes to VerificationRequest
, blender_verification_request
and blender_verification_order
.
send_blender_verification_request
add validation if all required data is available.celery -A concent_api worker -l info -Q concent,conductor,verifier
the exception is:Traceback (most recent call last):
File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/app/amqp.py", line 87, in __getitem__
return self.aliases[name]
File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/weakref.py", line 137, in __getitem__
o = self.data[key]()
KeyError: 'concent'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/worker/worker.py", line 171, in setup_queues
self.app.amqp.queues.select(include)
File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/app/amqp.py", line 188, in select
name: self[name] for name in maybe_list(include)
File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/app/amqp.py", line 188, in <dictcomp>
name: self[name] for name in maybe_list(include)
File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/app/amqp.py", line 89, in __getitem__
return dict.__getitem__(self, name)
File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/app/amqp.py", line 101, in __missing__
raise KeyError(name)
KeyError: 'concent'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/rwr/work/codepoets/golem/venv-concent/bin/celery", line 11, in <module>
sys.exit(main())
File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/__main__.py", line 14, in main
_main()
File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/bin/celery.py", line 326, in main
cmd.execute_from_commandline(argv)
File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/bin/celery.py", line 488, in execute_from_commandline
super(CeleryCommand, self).execute_from_commandline(argv)))
File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/bin/base.py", line 281, in execute_from_commandline
return self.handle_argv(self.prog_name, argv[1:])
File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/bin/celery.py", line 480, in handle_argv
return self.execute(command, argv)
File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/bin/celery.py", line 412, in execute
).run_from_argv(self.prog_name, argv[1:], command=argv[0])
File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/bin/worker.py", line 221, in run_from_argv
return self(*args, **options)
File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/bin/base.py", line 244, in __call__
ret = self.run(*args, **kwargs)
File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/bin/worker.py", line 255, in run
**kwargs)
File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/worker/worker.py", line 99, in __init__
self.setup_instance(**self.prepare_args(**kwargs))
File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/worker/worker.py", line 105, in setup_instance
self.setup_queues(queues, exclude_queues)
File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/worker/worker.py", line 174, in setup_queues
SELECT_UNKNOWN_QUEUE.strip().format(include, exc))
celery.exceptions.ImproperlyConfigured: Trying to select queue subset of ['concent', 'conductor', 'verifier'], but queue 'concent' isn't
defined in the `task_queues` setting.
report_upload
view could also receive subtask_id
as url parameter for better security.report_upload
handle GET or different HTTP method ?report_upload
need some better of handling duplicated requests. Currently I can create unlimited number UploadReport
objects for one file. There should be some kind of distinction if currently processed files is source
or results
and that there will be exactly one UploadReport
for each of those. Also, upload_finished
can be scheduled unlimited number of times.upload_acknowledged
queries for VerificationRequest
with given subtask_id
and changes VerificationRequest.upload_acknowledged
to True. Shouldn't it in the query filter by upload_acknowledged=False
?blender_verification_order
because second file is replacing first one.
- In
send_blender_verification_request
add validation if all required data is available.
You mean you want to create an issue to do it? This does not sound like something that's in the spec.
- Celery routing is not working at all, when running celery worker with
celery -A concent_api worker -l info -Q concent,conductor,verifier
the exception is:
This seems to have been a simple bug. See #442. task_create_missing_queues
was set to False
in celery.py
but the queues were not defined so it crashed. We have switched it to True
for now and @dybi will define the queues and switch it back in a separate pull request.
report_upload
view could also receivesubtask_id
as url parameter for better security.
But it would require nginx-storage to decode the token. The URL is intentionally as simple as possible so that nginx does not have to do any significant work.
In this design the responsibility for making sure that no one can overwrite someone else's files lies entirely on the control cluster. The storage cluster is meant to be dumb an accept anything the control cluster orders it to do.
- Should
report_upload
handle GET or different HTTP method ?
Only POST. GET requests should not be used for actions that change server state. See #208.
report_upload
need some better of handling duplicated requests. Currently I can create unlimited numberUploadReport
objects for one file. There should be some kind of distinction if currently processed files issource
orresults
and that there will be exactly oneUploadReport
for each of those.
That's by design. UploadReport
only says that a file with a specific name has been uploaded. Nothing more. There is not and should not be anything saying why it was uploaded. If user uploads a file three times, you get three reports (with different timestamps) and that's fine.
Later, when conductor is checking if the files match VerificationRequest
, it should fetch the reports and ignore duplicates. If a file has been uploaded multiple times, it's not our business. There's no point in doing that but it's possible.
Also,
upload_finished
can be scheduled unlimited number of times.
That's a bug. upload_finished
should be scheduled if and only if:
VerificationRequest
or any UploadReport
did not existThis can happen only once.
upload_acknowledged
queries forVerificationRequest
with givensubtask_id
and changesVerificationRequest.upload_acknowledged
to True. Shouldn't it in the query filter byupload_acknowledged=False
?
It will work fine either way. But it would be a good idea to log an error if upload_acknowledged
is already True
.
- There is a problem with downloading files. According to the way in which we store and build file paths, we have two files for each subtask, ie: blender/source/16356/16356.sub_task_id.zip blender/result/16356/16356.sub_task_id.zip So, actually the file names are the same. This is a problem during download in
blender_verification_order
because second file is replacing first one.
That's a bug in the implementation. You don't have to store the file on disk with exactly the same name. You can change it so that there's no conflict. Add a prefix or store each file in a subdirectory.
Update: added work queue diagram to the description.
@cameel
You mean you want to create an issue to do it? This does not sound like something that's in the spec.
It can be an issue, I am just writing this as general conclusion.
Only POST. GET requests should not be used for actions that change server state. See #208.
It needs fixing then, I will create an issue for it.
That's a bug. upload_finished should be scheduled if and only if: before the request either VerificationRequest or any UploadReport did not exist after the request they all exist This can happen only once.
I will add issue for this.
It will work fine either way. But it would be a good idea to log an error if upload_acknowledged is already True.
I will add issue for this.
That's a bug in the implementation. You don't have to store the file on disk with exactly the same name. You can change it so that there's no conflict. Add a prefix or store each file in a subdirectory.
It needs fixing then too, I will create an issue for it.
Description updated for #537 and #520.
@cameel, please update description according to: https://github.com/golemfactory/concent/issues/610
NOTE: This is still work in progress. Diagrams are mostly finished but I'm still working on the text to accompany them and explain the details.
Clusters and services running on them
Containers
geth
rabbitmq
nginx-proxy
nginx-storage
concent-api
verifier
Communication
Positive case
Negative cases and timeouts
Protocols and schemas
Storage database models
VerificationRequest
modelsubtask_id
NULL
or blank.source_package_path
NULL
or blank. Must be unique and cannot be used asresult_package_path
in any model instance.result_package_path
NULL
or blank. Must be unique and cannot be used assource_package_path
in any model instance.source_package_checksum
result_package_checksum
source_package_size
result_package_size
upload_finished
upload_finished
task for this subtask has already been sent to the work queue. Can't be NULL.upload_acknowledged
upload_acknowledged
task for this subtask has already been processed. Can't be NULL.BlenderSubtaskDefinition
modelFor each
VerificationRequest
there must be exactly oneBlenderSubtaskDefinition
in the database. In the future, when Concent supports more task types, there will be more models similar toBlenderSubtaskDefinition
, each one associated with its own instance ofVerificationRequest
.verification_request
VerificationRequest
foreign keyNULL
or blank. Must be unique.output_format
-F
command-line option. Can't beNULL
.scene_file
NULL
or blank.blender_crop_script
TaskToCompute.script_src
.created_at
NULL
.Frame
modelA subtask task may involve rendering multiple frames. This object indicates that a specific frame should be rendered in a specific subtask.
VerificationRequest
along with itsBlenderSubtaskDefinition
is always associated with one or moreFrame
instances.blender_subtask_definition
BlenderSubtaskDefinition
foreign keynumber
(
blender_subtask_definition
,number
) pair must be unique.UploadReport
modelExistence of this object indicates that a file has been uploaded to
nginx-storage
and nginx notified Conductor about this fact.Note that it may be possible for the client to upload a file even when there is no corresponding
VerificationRequest
. This can happen if the upload finishes before Conductor receivesblender_verification_request
from work queue or if the upload is not done in the verification use case (but for example the 'force get task result' use case).Multiple
UploadReport
instances can exist for the same file - if the client uploads it multiple times.path
FileTransferToken
s are relative to. Does not have to be unique. It's technically possible for the client to upload a file twice.verification_request
VerificationRequest
foreign keyNULL
if there's no corresponding request.created_at
NULL
.Work queue tasks
blender_verification_request
tasksubtask_id
source_package_path
result_package_path
result_package_checksum
source_package_checksum
result_package_size
source_package_size
output_format
scene_file
blender_crop_script
frames
Parameters have the same meaning as in
VerificationRequest
andBlenderSubtaskDefinition
models above.blender_verification_order
taskSame as
blender_verification_request
. The difference is in the purpose of these two tasks, not in their content. One notifies the storage cluster about a verification request, the other is an order issued to verifier to make it start verification.verification_result
tasksubtask_id
result
VerificationResult
enumerror_message
reason != ERROR
error_code
reason != ERROR
upload_finished
tasksubtask_id
upload_acknowledged
tasksubtask_id
VerificationResult
enumerationMATCH
MISMATCH
ERROR