Implement MAIA - Githubissues

mzur commented 6 years ago

Implement the complete MAIA method in BIIGLE. Develop a workflow and the required UI and check if existing UI and tools can be used for this. Also develop an architecture that enables asynchronous use of GPU resources (via a Laravel queue).

mzur commented 6 years ago

Here is a tool that uses GPUs from inside a Docker container.

mzur commented 6 years ago

The architecture must be compatible with the following setups:

Our OpenStack setup with GPUs on dynamically booted VMs
The (future) Geomar setup with GPUs on permanently running other machines
Any local setup (including the Jetson) with a GPU on the same machine

mzur commented 6 years ago

Here is the plan for the architecture. It is composed of two new BIIGLE modules (biigle/gpu, biigle/maia) and a new application (biigle/gpu-server).

biigle/gpu-server This is a stand-alone Lumen application that runs on a machine with GPU. This can be the same machine or another machine than BIIGLE is running on. It is composed of webserver, application and Redis cache Docker containers and accepts "jobs" from the BIIGLE application as well as returns the responses via HTTP. It is a very slim application, providing common interfaces (e.g. to fetch image files or queue GPU compute jobs) for its modules. GPU access from inside Docker is realized with nvidia-docker2 as described here. The GPU server supports modules similar to BIIGLE, each module containing the HTTP API endpoints, logic and Python TensorFlow code for GPU computations.

The GPU server can be configured to accept jobs from one or more BIIGLE instances which are authorized via API tokens. These are permanently configured without the need for a database.

It should also be possible to integrate the application Docker container of the GPU server into the existing Docker Compose ensemble of containers of a regular BIIGLE instance. This is a use case for a single-machine instance (like the Jetson) where the whole setup should behave like a single application.

biigle/gpu This is a module both for BIIGLE and the GPU server. It provides two service provider classes, one for BIIGLE and one for the GPU server. It manages communication between BIIGLE and the GPU server. It provides a central interface for other BIIGLE modules that want to submit jobs to the GPU server and handles forwarding of these jobs (using the API token for the GPU server) as well as the responses (and error handling).

This module should support multiple adapters for communication with the GPU server. Among these:

An OpenStack adapter that attempts to contact the GPU server on an OpenStack compute instance (with GPU). If none is available, it boots up a new compute instance with a prepared BIIGLE GPU server image and submits new jobs there. Once there are no jobs to be processed on the compute instance (determined by polling of the GPU server), this adapter shuts the instance down and deletes it again.
A "remote" adapter that assumes the GPU server running at some remote IP address. New jobs are submitted there.
A "local" adapter that assumes the GPU server running in the same "private network" maintained by Docker Compose. New jobs are submitted there. Is this possible if two PHP application containers should communicate directly via HTTP without intermediate webserver? Maybe they can simply communicate through a queue on the same Redis cache.

biigle/maia (this repository) This is a module both for BIIGLE and the GPU server. It provides two service provider classes, one for BIIGLE and one for the GPU server. The code for BIIGLE implements the UI and the logic required to submit new MAIA jobs to the GPU server (using biigle/gpu). The code for the GPU server contains special API endpoints that accept the jobs, logic to create the training datasets and process the result images as well as the autoencoder and Mask-RCNN Python code to be executed on the GPU.

dlangenk commented 6 years ago

This seems very robust but also quite complicated. Wouldn't it make sense to have a more lightweight setup with less setup and fewer components, e.g. why put the GPU inside docker?

mzur commented 6 years ago

Feel free to suggest a simpler architecture :wink: It got as complicated as this because I have to cover all planned use cases: our OpenStack setup with on-demand GPU instances, the Geomar setup with some undefined future GPU resources, any instance on a laptop, Jetson or mobile GPU cluster (Geomar is building one) on a ship. The architecture should be future-oriented, to support several modules utilizing a GPU as well. I chose Docker here, too, because it offers the same advantages for controlled environments running at different locations (Bielefeld, Geomar, ship etc.) than the Docker setup of the BIIGLE main application. I learned my lesson from asking everybody to install and update dependencies for their BIIGLE instances themselves. Distributing Docker images is much easier for me. In addition, the GPU server Docker container can be integrated in a Docker Compose ensemble of a BIIGLE application so it is very easy to install and start everything on a single machine like the Jetson.

mzur commented 6 years ago

Instead of biigle/gpu, I decided to implement biigle/laravel-remote-queue as a generic package to submit queued jobs to another Laravel/Lumen instance. BIIGLE will use it to submit GPU jobs to the GPU server and the GPU server will use it to submit "response" jobs back to BIIGLE.

The big difference to other (remote) queue drivers of Laravel will be that the extended package biigle/remote-queue-openstack will allow submitting jobs to an OpenStack compute instance that is lazily booted and shut down as described above.

mzur commented 5 years ago

What icon should we use? Suggestions:

dlangenk commented 5 years ago

Bullseye oder Robot

mzur commented 5 years ago

I'm no longer sure if dynamically resuming/suspending of GPU instances is the way to go. In order for BIIGLE to be able to modify compute instances, we would need to give it full access to our OpenStack project. This means that any attacker who is able to gain access to the BIIGLE instance, also gains access to our OpenStack project and can delete everything (except the database backups on the Cebitec file system).

Edit: I deleted the laravel-remote-queue-openstack repository again. If we ever need it, I have a local copy.

mzur commented 5 years ago

Done with v1.

biigle / maia

Implement MAIA #3