golemfactory / concent

Repository for Concent Service sources
8 stars 7 forks source link

[WIP][Blueprint] Additional verification #193

Open cameel opened 6 years ago

cameel commented 6 years ago

NOTE: This is still work in progress. Diagrams are mostly finished but I'm still working on the text to accompany them and explain the details.

Clusters and services running on them

concent-clusters

Containers

geth

concent-container-geth

rabbitmq

concent-container-rabbitmq

nginx-proxy

concent-container-nginx-proxy

nginx-storage

concent-container-nginx-storage

concent-api

concent-container-concent-api

verifier

concent-container-verifier

Communication

Positive case

concent-verification-sequence

Negative cases and timeouts

Protocols and schemas

Storage database models

VerificationRequest model

Column name Type Remarks
subtask_id string Subtask ID. Can't be NULL or blank.
source_package_path string Relative path of the .zip file that contains Blender source files for the render. Can't be NULL or blank. Must be unique and cannot be used as result_package_path in any model instance.
result_package_path string Relative path of the .zip file that contains the rendering result received from the provider. Can't be NULL or blank. Must be unique and cannot be used as source_package_path in any model instance.
source_package_checksum string Hash of the source package. Consists of two parts separated with a colon: checksum type and the checksum itself.
result_package_checksum string Hash of the result package. Consists of two parts separated with a colon: checksum type and the checksum itself.
source_package_size int Size of the source package in bytes.
result_package_size int Size of the result package in bytes.
upload_finished bool True when upload_finished task for this subtask has already been sent to the work queue. Can't be NULL.
upload_acknowledged bool True when upload_acknowledged task for this subtask has already been processed. Can't be NULL.

BlenderSubtaskDefinition model

For each VerificationRequest there must be exactly one BlenderSubtaskDefinition in the database. In the future, when Concent supports more task types, there will be more models similar to BlenderSubtaskDefinition, each one associated with its own instance of VerificationRequest.

Column name Type Remarks
verification_request VerificationRequest foreign key Can't be NULL or blank. Must be unique.
output_format enum Type of the output image to be produced by Blender. This determines the file extensions. Only formats supported by Blender should be allowed here. This value is passed to Blender using the -F command-line option. Can't be NULL.
scene_file string Relative path to the .blend file inside the source package. Can't be NULL or blank.
blender_crop_script string The script to be run by Blender. It sets rendering parameters, including the position and dimensions of the image fragment that should be rendered. Comes from TaskToCompute.script_src.
created_at datetime Indicates when Conductor has received the request. Can't be NULL.

Frame model

A subtask task may involve rendering multiple frames. This object indicates that a specific frame should be rendered in a specific subtask.

VerificationRequest along with its BlenderSubtaskDefinition is always associated with one or more Frame instances.

Column name Type Remarks
blender_subtask_definition BlenderSubtaskDefinition foreign key Non-null.
number int Frame number. Non-null. Non-negative.

(blender_subtask_definition, number) pair must be unique.

UploadReport model

Existence of this object indicates that a file has been uploaded to nginx-storage and nginx notified Conductor about this fact.

Note that it may be possible for the client to upload a file even when there is no corresponding VerificationRequest. This can happen if the upload finishes before Conductor receives blender_verification_request from work queue or if the upload is not done in the verification use case (but for example the 'force get task result' use case).

Multiple UploadReport instances can exist for the same file - if the client uploads it multiple times.

Column name Type Remarks
path string Relative path of the file. Relative to the same directory that paths listed in FileTransferTokens are relative to. Does not have to be unique. It's technically possible for the client to upload a file twice.
verification_request VerificationRequest foreign key Can be NULL if there's no corresponding request.
created_at datetime Indicates when conductor has been notified about the upload. Can't be NULL.

Work queue tasks

work-queue

blender_verification_request task

Parameter name Type
subtask_id string
source_package_path string
result_package_path string
result_package_checksum string
source_package_checksum string
result_package_size int
source_package_size int
output_format enum
scene_file string
blender_crop_script string
frames list of int

Parameters have the same meaning as in VerificationRequest and BlenderSubtaskDefinition models above.

blender_verification_order task

Same as blender_verification_request. The difference is in the purpose of these two tasks, not in their content. One notifies the storage cluster about a verification request, the other is an order issued to verifier to make it start verification.

verification_result task

Parameter name Type Remarks
subtask_id string
result VerificationResult enum
error_message string Human-readable error message. Must be blank if reason != ERROR
error_code string Machine-readable error code. Must be blank if reason != ERROR

upload_finished task

Parameter name Type Remarks
subtask_id string

upload_acknowledged task

Parameter name Type Remarks
subtask_id string

VerificationResult enumeration

rwrzesien commented 6 years ago

@cameel

verification_request task Parameters: compute_task_def - ComputeTaskDef from TaskToCompute

In what format this is going to be passed/received ? I think that you can't pass unserialized objects to celery tasks.

cameel commented 6 years ago

As a dict. If we can't pass a dict, we'll have to split it into individual parameters.

This is just an implementation detail so I did not want to get too much into specifics. Basically - we want to pass ComputeTaskDef (or at least all relevant parts) between Concent and Conductor and how exactly we achieve this is a different concern. I know it's doable.

rwrzesien commented 6 years ago

Right, I have forgot that ComputeTaskDef is a dict, not a message, then there is no problem.

cameel commented 6 years ago

Update: Changed the structure of database tables and Celery tasks:

The old version of the part that has changed (for reference)

Protocols and schemas

Storage database models

VerificationRequest model

Columns:

UploadRequest model

Existence of this object indicates that Concent is expecting a client to upload a specific file.

Columns:

UploadReport model

Existence of this object indicates that a file has been uploaded to nginx-storage and nginx notified Conductor about this fact.

Note that may be possible for the client to upload a file even when there is no corresponding UploadRequest. This can happen if the upload finishes before Conductor receives verification_request from work queue or if the upload is not done in the verification use case (but for example the 'force get task result' use case).

Multiple UploadReport instances can exist for the same file - if the client uploads it multiple times.

Columns:

Work queue tasks

verification_request task

Parameters:

verification_order task

Parameters:

verification_result task

Parameters:

VerificationResult enumeration

cameel commented 6 years ago

Update: UploadRequest model has been removed since now VerificationRequest is always associated with exactly two files and has fields that contain their paths.

The old version of the part that has changed (for reference)

UploadRequest model

Existence of this object indicates that Concent is expecting a client to upload a specific file.

Column name Type Remarks
verification_request VerificationRequest foreign key Can't be NULL.
path string Relative path of the file. Relative to the same directory that paths listed in FileTransferTokens are relative to. Must be unique. Concent core (which sends the request) is responsible for tracking all verification requests and must never generate duplicate paths. Can't be blank or NULL.

UploadReport model

Existence of this object indicates that a file has been uploaded to nginx-storage and nginx notified Conductor about this fact.

Note that it may be possible for the client to upload a file even when there is no corresponding UploadRequest. This can happen if the upload finishes before Conductor receives blender_verification_request from work queue or if the upload is not done in the verification use case (but for example the 'force get task result' use case).

Multiple UploadReport instances can exist for the same file - if the client uploads it multiple times.

Column name Type Remarks
path string Relative path of the file. Relative to the same directory that paths listed in FileTransferTokens are relative to. Does not have to be unique. It's technically possible for the client to upload a file twice.
upload_request UploadRequest foreign key Can be NULL if there's no corresponding request.
created_at datetime Indicates when conductor has been notified about the upload. Can't be NULL.
cameel commented 6 years ago

Updates for #411 and #413:

cameel commented 6 years ago

Update: Added package checksums and sizes to VerificationRequest, blender_verification_request and blender_verification_order.

rwrzesien commented 6 years ago
  1. In send_blender_verification_request add validation if all required data is available.
  2. Celery routing is not working at all, when running celery worker with celery -A concent_api worker -l info -Q concent,conductor,verifier the exception is:
Traceback (most recent call last):
  File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/app/amqp.py", line 87, in __getitem__
    return self.aliases[name]
  File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/weakref.py", line 137, in __getitem__
    o = self.data[key]()
KeyError: 'concent'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/worker/worker.py", line 171, in setup_queues
    self.app.amqp.queues.select(include)
  File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/app/amqp.py", line 188, in select
    name: self[name] for name in maybe_list(include)
  File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/app/amqp.py", line 188, in <dictcomp>
    name: self[name] for name in maybe_list(include)
  File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/app/amqp.py", line 89, in __getitem__
    return dict.__getitem__(self, name)
  File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/app/amqp.py", line 101, in __missing__
    raise KeyError(name)
KeyError: 'concent'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/rwr/work/codepoets/golem/venv-concent/bin/celery", line 11, in <module>
    sys.exit(main())
  File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/__main__.py", line 14, in main
    _main()
  File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/bin/celery.py", line 326, in main
    cmd.execute_from_commandline(argv)
  File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/bin/celery.py", line 488, in execute_from_commandline
    super(CeleryCommand, self).execute_from_commandline(argv)))
  File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/bin/base.py", line 281, in execute_from_commandline
    return self.handle_argv(self.prog_name, argv[1:])
  File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/bin/celery.py", line 480, in handle_argv
    return self.execute(command, argv)
  File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/bin/celery.py", line 412, in execute
    ).run_from_argv(self.prog_name, argv[1:], command=argv[0])
  File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/bin/worker.py", line 221, in run_from_argv
    return self(*args, **options)
  File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/bin/base.py", line 244, in __call__
    ret = self.run(*args, **kwargs)
  File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/bin/worker.py", line 255, in run
    **kwargs)
  File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/worker/worker.py", line 99, in __init__
    self.setup_instance(**self.prepare_args(**kwargs))
  File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/worker/worker.py", line 105, in setup_instance
    self.setup_queues(queues, exclude_queues)
  File "/home/rwr/work/codepoets/golem/venv-concent/lib/python3.6/site-packages/celery/worker/worker.py", line 174, in setup_queues
    SELECT_UNKNOWN_QUEUE.strip().format(include, exc))
celery.exceptions.ImproperlyConfigured: Trying to select queue subset of ['concent', 'conductor', 'verifier'], but queue 'concent' isn't
defined in the `task_queues` setting.
  1. report_upload view could also receive subtask_id as url parameter for better security.
  2. Should report_upload handle GET or different HTTP method ?
  3. report_upload need some better of handling duplicated requests. Currently I can create unlimited number UploadReport objects for one file. There should be some kind of distinction if currently processed files is source or results and that there will be exactly one UploadReport for each of those. Also, upload_finished can be scheduled unlimited number of times.
  4. upload_acknowledged queries for VerificationRequest with given subtask_id and changes VerificationRequest.upload_acknowledged to True. Shouldn't it in the query filter by upload_acknowledged=False ?
  5. There is a problem with downloading files. According to the way in which we store and build file paths, we have two files for each subtask, ie: blender/source/16356/16356.sub_task_id.zip blender/result/16356/16356.sub_task_id.zip So, actually the file names are the same. This is a problem during download in blender_verification_order because second file is replacing first one.
cameel commented 6 years ago
  1. In send_blender_verification_request add validation if all required data is available.

You mean you want to create an issue to do it? This does not sound like something that's in the spec.

  1. Celery routing is not working at all, when running celery worker with celery -A concent_api worker -l info -Q concent,conductor,verifier the exception is:

This seems to have been a simple bug. See #442. task_create_missing_queues was set to False in celery.py but the queues were not defined so it crashed. We have switched it to True for now and @dybi will define the queues and switch it back in a separate pull request.

  1. report_upload view could also receive subtask_id as url parameter for better security.

But it would require nginx-storage to decode the token. The URL is intentionally as simple as possible so that nginx does not have to do any significant work.

In this design the responsibility for making sure that no one can overwrite someone else's files lies entirely on the control cluster. The storage cluster is meant to be dumb an accept anything the control cluster orders it to do.

  1. Should report_upload handle GET or different HTTP method ?

Only POST. GET requests should not be used for actions that change server state. See #208.

  1. report_upload need some better of handling duplicated requests. Currently I can create unlimited number UploadReport objects for one file. There should be some kind of distinction if currently processed files is source or results and that there will be exactly one UploadReport for each of those.

That's by design. UploadReport only says that a file with a specific name has been uploaded. Nothing more. There is not and should not be anything saying why it was uploaded. If user uploads a file three times, you get three reports (with different timestamps) and that's fine.

Later, when conductor is checking if the files match VerificationRequest, it should fetch the reports and ignore duplicates. If a file has been uploaded multiple times, it's not our business. There's no point in doing that but it's possible.

Also, upload_finished can be scheduled unlimited number of times.

That's a bug. upload_finished should be scheduled if and only if:

This can happen only once.

  1. upload_acknowledged queries for VerificationRequest with given subtask_id and changes VerificationRequest.upload_acknowledged to True. Shouldn't it in the query filter by upload_acknowledged=False ?

It will work fine either way. But it would be a good idea to log an error if upload_acknowledged is already True.

  1. There is a problem with downloading files. According to the way in which we store and build file paths, we have two files for each subtask, ie: blender/source/16356/16356.sub_task_id.zip blender/result/16356/16356.sub_task_id.zip So, actually the file names are the same. This is a problem during download in blender_verification_order because second file is replacing first one.

That's a bug in the implementation. You don't have to store the file on disk with exactly the same name. You can change it so that there's no conflict. Add a prefix or store each file in a subdirectory.

cameel commented 6 years ago

Update: added work queue diagram to the description.

rwrzesien commented 6 years ago

@cameel

You mean you want to create an issue to do it? This does not sound like something that's in the spec.

It can be an issue, I am just writing this as general conclusion.

Only POST. GET requests should not be used for actions that change server state. See #208.

It needs fixing then, I will create an issue for it.

That's a bug. upload_finished should be scheduled if and only if: before the request either VerificationRequest or any UploadReport did not exist after the request they all exist This can happen only once.

I will add issue for this.

It will work fine either way. But it would be a good idea to log an error if upload_acknowledged is already True.

I will add issue for this.

That's a bug in the implementation. You don't have to store the file on disk with exactly the same name. You can change it so that there's no conflict. Add a prefix or store each file in a subdirectory.

It needs fixing then too, I will create an issue for it.

cameel commented 6 years ago

Description updated for #537 and #520.

dybi commented 6 years ago

@cameel, please update description according to: https://github.com/golemfactory/concent/issues/610