galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.41k stars 1.01k forks source link

Container abstraction planning #3804

Closed natefoo closed 7 years ago

natefoo commented 7 years ago

Due to recent work on supporting docker swarm for GIEs, it's become apparent that some level of abstraction between container consumers (e.g. GIEs or jobs) and container systems (e.g. docker) would be good, so that we're not writing docker-specific stuff into GIE configs, and so that we can support more complicated environments.

I've also discovered that my initial plan of supporting a swarm that spans both Jetstream zones (TACC and IU) won't work as neatly as planned, because of the reasons outlined in #3802.

The idea would be to build off of the "swarm manager" config I recently added and move those config options to a dict-of-dicts in a more generic config (containers_conf.yml, but eventually in galaxy.yml when that happens?) e.g.:

containers:
  _default_:
    type: docker
  remote:
    type: docker
    command: docker --tlsverify -H tcp://dockerd.example.org:2376/ {docker_args}

And a more complex example:

containers:
  select_at_random:
    type: random
    group: 
      - jetstream_iu
      - jetstream_tacc
  select_by_function:
    type: python
    python_function: galaxy.containers.rules.select_container
  jetstream_iu:
    type: docker
    docker_command: docker --tlsverify -H tcp://jetstream-iu:2376 {docker_args}
    swarm_mode: yes
  jetstream_tacc:
    type: docker:
    docker_command: docker --tlsverify -H tcp://jetstream-tacc:2376 {docker_args}
    swarm_mode: yes

Where galaxy.containers.rules.select_container() would take some arguments a la dynamic job rules and return the name of another key in the containers dict.

As for mapping GIEs to containers, we could do this with a dict like:

interactive_environment_plugins:
  jupyter
    container: select_by_function
  r_studio:
    container: select_at_random
  phinch:
    container: jetstream_iu

It'd be great if this same config worked for jobs running in containers, but jobs are complicated since containers would typically sit at a level beneath a DRM, but might (as in k8) be managed by the DRM.

cc: @erasche @bgruening @jmchilton

jmchilton commented 7 years ago
  jetstream_tacc:
    type: docker:
    docker_command: docker --tlsverify -H tcp://jetstream-tacc:2376 {docker_args}
    swarm_mode: yes

should be

  jetstream_tacc:
    type: docker
    force_tlsverify: true
    host: "tcp://jetstream-tacc:2376"
    swarm_mode: yes

in my opinion.

jmchilton commented 7 years ago

We could tie this into the local job runner or the Python job manager in Pulsar pretty easily if it is architected well. It would be better to use a DRM that supported containers probably - but it might be worth the effort of doing that anyway to ensure the abstractions are useful and reusable outside of IEs.

bgruening commented 7 years ago

Like it! I second Johns recommendation, we should be more declarative here and open for the future to run the stack with rkt or similar technologies.