EUDAT-B2STAGE / http-api

RESTful HTTP-API for the B2STAGE service inside the EUDAT project
https://eudat-b2stage.github.io/http-api/
MIT License
7 stars 7 forks source link

Asynchronous tasks #9

Closed pdonorio closed 8 years ago

pdonorio commented 8 years ago

A technical operation to discuss is how to use asynchronous operations inside the python code.

A useful tutorial for this kind of operations is this post about Flask and Celery.

jrybicki-jsc commented 8 years ago

it is interesting to see how you try to re-implement something which was working (https://github.com/EUDAT-B2STAGE/B2STAGE-HTTP), nothing like double effort.

But with regard to celery flask, there were some issues, you can get some more details at: https://github.com/httpPrincess/celery-flask-problem

hope this helps

pdonorio commented 8 years ago

it is interesting to see how you try to re-implement something which was working

Hi JJ, the "rewrite from scratch" has been decided in a previous TC. Roberto, which wrote the old version you linked, is working with me on this new version so we are trying to avoid the same mistakes. We are using a very different approach, trying our best to make the code more reusable this time, as we already have many other REST API projects to implement.

But with regard to celery flask, there were some issues

Very interesting. I never used Celery so far, but everyone told me is very powerful. The problem you encountered should be already resolved in later releases of the package i guess, but i could check if i come to the same situation.

For any suggestion i am very open 😉

jrybicki-jsc commented 8 years ago

just to get the facts straight: I see no roberto here: https://github.com/EUDAT-B2STAGE/B2STAGE-HTTP/graphs/contributors

Also not blaming you or anyone for anything, the task was killed by appointing the most lazy and unqualified person (shaun de witt) for manager. The software were in a usable state almost 2 years ago. Even if you implement with speed of light you won't be ready in the past ;)

the issues with celery were pretty hard to debug thus I share it. In the end celery was working, and it was working pretty much ok. Check the workflow feature it offers, it fits pretty well with the b2safe worflow. The only problem is the very high abstraction it offers, it does not fit all the use cases, for that I used pure messaging on some other projects.

pdonorio commented 8 years ago

just to get the facts straight: I see no roberto here

You're right, i see that you also were involved. Anyway i hope things go differently this time ^_^

The only problem is the very high abstraction it offers

Thanks for sharing!

pdonorio commented 8 years ago

A note for my self, an interesting reading on celery: http://blog.domanski.me/how-celery-fixed-pythons-gil-problem/

pdonorio commented 8 years ago

It feels like asynchronous tasks are getting more important inside this project at every meeting we do. So, before my holidays I'd like to spin a quick test on a celery task served by Python.

Things that should be done in that case:

pdonorio commented 8 years ago

A general discussions about task queues in python, just for the record https://www.fullstackpython.com/task-queues.html

Extracting from there a quicker possibility: http://nvie.com/posts/introducing-rq/

pdonorio commented 8 years ago

Wrote a gist con celery since I had it working finding the right tutorial to fix ideas.

pdonorio commented 8 years ago

@mdantonio tomorrow we can put the docker stuff inside our compose and make a first test

pdonorio commented 8 years ago

To return a partial content (e.g. the request was queued in celery) we don't know if it's the 206 code case or 202. To be further investigated.

mdantonio commented 8 years ago

206 PARTIAL CONTENT should only be used with GET calls

The golden standard that should be implemented in case of asynchronous calls is the following:

The async endpoint returns a 202 Accepted because the request is accepted for processing but has not been completed and adds a Location:url in the headers to provide to the client an url to verify request status

the status_verification_url returns a 200 OK if the request is still in progress (and should provide as much information as possible, e.g. a progress, ad expected polling interval, etc)

When the request is completed this url returns a 201 CREATED (a 303 SEE OTHER can also be used) with a Location:url in headers. In this case url contains che created resource.

A DELETE on status_verification_url could also be implemented, to stop requests

pdonorio commented 8 years ago

@mdantonio you may start working on your celery tasks.

You should start with my restapi-template, and then we discuss how to integrate.

Steps to use one:

  1. Write a task in vanilla/tasks like this one
  2. Launch workers. E.g. in my template repo you can use ./do DEBUG 2 to have 2 workers running.
mdantonio commented 8 years ago

Why this issue has been closed?

I made some tests and implemented a workflow as discussed above, e.g.

POST /queue -> returns 202 Accepted, TaskID and Location url

GET /queue/TaskID [ == Location url ] -> returns 201 Created when task is completed and 200 OK with task information (and running progress) otherwise

Everything works quite fine but we have several points to be discussed

Two main problems are: how to access to resources (graphdb, irods) from a task and how to distinguish between never-existed-tasks, pending tasks and completed-and-forgot-tasks. With celery these three categories are all the same. Category one and three could be considered the same, but category two should be distinguishable. Probabily we have to store tasks information on graphdb, because some information are missing in redis/celery.

Errors and failures are to be further investigated. Celery gives enough information and utilities to handle errors, but we have to implement a strong worflow to avoid that asynchronous operation could fall in a deep well.

Celery is able to report the exit status (success or failure), exceptions raised during task execution, it can run distinct callback functions in case of success or failure and is able to perform automatic retries. We have to combine such tools to implemente a foolproof workflow


Note that i need two different async workflows for my application

The first is the "standard one", as described above (e.g. for batch import operations).

A second one is similar to your case. When I import a file into a dataset I create a File node and return the FileAccessionID. At this point i have to parse the file to obtain information (e.g. number of reads) and this operation can be performed asynchrously without any particular need to information the user about the operation progress and completion (simply further retrieves of file information will provide more information than before)

pdonorio commented 8 years ago

Why this issue has been closed?

Because we may open others for the upcoming problems 😉

how to access to resources (graphdb, irods) from a task

I will reopen here and fix this

and how to distinguish between never-existed-tasks, pending tasks and completed-and-forgot-tasks

I have no idea. We need to dig little more and probably discuss this on another issue

Errors and failures are to be further investigated.

Another issue again 😛

pdonorio commented 8 years ago

how to access to resources (graphdb, irods) from a task

We just discovered on our own skin how difficult is to share objects in Flask when it's used somehow out of context, for example in a celery task which has no concept of a "request".

The solution currently tested is to share a global class among the whole code to store shared services.

Will clean the code, commit and close.

pdonorio commented 8 years ago

Warning @mdantonio and @muccix

We created a testing_mode (which is more compliant to also the worker_mode for celery) inside the flask factory method.

To all of us it means that inside tests:

# WRONG
app = create_app(testing=True)
# RIGHT
app = create_app(testing_mode=True)
pdonorio commented 8 years ago

Closing with a working example in 4a387b0020bba16537d3480d1019acd2dbda12ea

mdantonio commented 8 years ago

Closing with a working example

The example works (is able to connect to neo4j) but custom models are not loaded. Celery prints the following error:

CRITICAL/Worker-2] Failed to load resource: No module named 'commons.models.custom.neo4j'

pdonorio commented 8 years ago

Can you check with a bash inside the worker if the directory is mounted correctly and also its content?

mdantonio commented 8 years ago

From inside the container the directory is mounted but is empty

mdantonio commented 8 years ago

The base docker-compose is missing some volume for the worker service, in particular is missing this: (i copy-paste from the backend service)

    # Rest API services models
    - ../vanilla/models:/code/commons/models/custom
mdantonio commented 8 years ago

After copying a lot of things (volumes and environment vars) in both base and custom docker-compose.yml from backend to worker, now everything works very fine.

We already found a mechanism to avoid such manual synchronization of backend and worker services?

pdonorio commented 8 years ago

We already did in the other template repo, see: https://github.com/pdonorio/restapi-template/blob/master/docker-compose.yml#L49-L54

mdantonio commented 8 years ago

Very well, i integrated this configuration.

I removed the VOLUME_PREFIX since it is not yet supported by the current do command

mdantonio commented 8 years ago

MMM, the extension brings to a port collision

backend:
    extends:
        service: conf
    image: pdonorio/py3apil
    hostname: restapi
    ports:
        - 8081:5000
    volumes:
        [...]

worker:
  extends:
    file: docker-compose.yml
    service: backend
  hostname: celworker
  command: celery worker -A restapi.resources.services.celery.worker.celery_app

With the extention, worker inherits the backend port, how can be override? Or maybe i misunderstood the configuration?

Add a different port on the worker service doesn't work, because:

For the multi-value options ports, expose, external_links, dns, dns_search, and tmpfs, Compose concatenates both sets of values: (from https://docs.docker.com/compose/extends)

Here the error obtained when starting the worker:

docker-compose scale worker=1

WARNING: The "worker" service specifies a port on the host. If multiple containers for this service are created on a single host, the port will clash.
Starting containers_worker_1 ... error

ERROR: for containers_worker_1  b'failed to create endpoint containers_worker_1 on network bridge: Bind for 0.0.0.0:8081 failed: port is already allocated'
pdonorio commented 8 years ago

No port should be specified in the original docker compose file. They could be opened for each service in debug, development and or production depending on the level of security.

You should open a port in debug for the API backend. Workers do not need to listen to any port instead.

Il Mer 20 Lug 2016, 13:50 mdantonio notifications@github.com ha scritto:

MMM, the extension brings to a port collision

backend: extends: service: conf image: pdonorio/py3apil hostname: restapi ports:

  • 8081:5000 volumes: [...]

worker: extends: file: docker-compose.yml service: backend hostname: celworker command: celery worker -A restapi.resources.services.celery.worker.celery_app

With the extention, worker inherits the backend port, how can be override? Or maybe i misunderstood the configuration?

Add a different port on the worker service doesn't work, because:

For the multi-value options ports, expose, external_links, dns, dns_search, and tmpfs, Compose concatenates both sets of values: (from https://docs.docker.com/compose/extends)

Here the error obtained when starting the worker:

docker-compose scale worker=1

WARNING: The "worker" service specifies a port on the host. If multiple containers for this service are created on a single host, the port will clash. Starting containers_worker_1 ... error

ERROR: for containers_worker_1 b'failed to create endpoint containers_worker_1 on network bridge: Bind for 0.0.0.0:8081 failed: port is already allocated'

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/EUDAT-B2STAGE/http-api/issues/9#issuecomment-233927405, or mute the thread https://github.com/notifications/unsubscribe-auth/AI6FD83JIQF4cO36tjnkMmTH4Vb7JcnNks5qXguKgaJpZM4IQwOK .

mdantonio commented 8 years ago

Ok, i understand. So the main problem is that the restangulask configuration is still quite archaic (single yaml with APP_MODE inside).

Considering that, i think that the refactoring of both project structure and do command is now a priority, also to achieve a stable configuration for Celery (for now i will continue with tests by adding some workaround to the actual configuration)

-> add to the agenda for Tuesday

pdonorio commented 8 years ago

Ok, so we should do that in pdonorio/restangulask#32