flux-framework / flux-operator

Deploy a Flux MiniCluster to Kubernetes with the operator
https://flux-framework.org/flux-operator/
MIT License
31 stars 8 forks source link

Support > 1 container in CRD #31

Closed vsoch closed 1 year ago

vsoch commented 2 years ago

Double rainbow :rainbow: or double container? :thinking: ?

Today in the meeting @milroy mentioned a different kind of job with mpi and Pytorch (I think) and we want to be able to support this case. When we have a link to the files / containers / details I can work on this (and looking forward to it!)

milroy commented 2 years ago

The capability is needed by a use case from the AHA MoleS workflow that currently runs under Flux. The workflow is composed of MPI-based docking + PyTorch-based fusion. I expect we will see many more workflows that consist of multiple containers per pod in the future.

I don't have an example manifest for this workflow yet, but plan to start work on porting the workflow once the basic capability is in the Flux Operator. (I'm also planning a hackathon with the workflow developers.) Here's an basic example of running two containers in a pod with the MPI Operator (untested):

apiVersion: kubeflow.org/v2beta1
kind: MPIJob 
metadata:          
  name: lammps+amg
spec:                                                                                                                  
  slotsPerWorker: 1
  runPolicy:                       
    cleanPodPolicy: Running                            
  sshAuthMountPath: /root/.ssh
  mpiReplicaSpecs:                                      
    Launcher:                       
      replicas: 1         
      template:           
        spec:                                 
          containers:
          - image: milroy1/kf-testing:lammps-focal-openmpi-4.1.2-flux
            imagePullPolicy: Always
            name: mpi-launcher
            command:
            - bash
            - -cx 
            - ". /etc/profile && mpirun --allow-run-as-root --mca orte_launch_agent /opt/view/bin/orted --mca plm_rsh_agent rsh -x PATH -x LD_LIBRARY_PATH -np 2 --map-by socket lmp -v x 4 -v y 2 -v z 2 -in in.reaxc.hns -nocite"
            resources:
              limits:
                cpu: 1
                memory: 2Gi
              requests:
                cpu: 1
                memory: 2Gi
          - image: milroy1/kf-testing:amg-focal-openmpi-4.1.2-amd
            imagePullPolicy: Always
            name: mpi-launcher
            command:
            - bash
            - -cx 
            - ". /etc/profile && time -p mpirun --allow-run-as-root --mca orte_launch_agent /opt/view/bin/orted --mca plm_rsh_agent rsh -x PATH -x LD_LIBRARY_PATH -np 2 --map-by numa --rank-by core --bind-to core amg -n 4 4 4 -P 2 1 1"
            resources:
              limits:
                cpu: 1
                memory: 2Gi
              requests:
                cpu: 1
                memory: 2Gi
          tolerations:
          - key: "launcher"
            operator: "Exists"
            effect: "NoSchedule"
    Worker:
      replicas: 2
      template:
        metadata:
          app: lammps+amg
          labels:
            app: lammps
        spec:
          containers:
          - image: milroy1/kf-testing:lammps-focal-openmpi-4.1.2-flux
            imagePullPolicy: Always
            name: worker
            lifecycle:
              postStart:
                exec:
                  command:
                  - bash
                  - -c
                  - "while ! bash -c \"</dev/tcp/localhost/22\" >/dev/null 2>&1; do sleep 0.1; done"
            command:
            - /usr/sbin/sshd
            args:
            - -De
            resources:
              limits:
                cpu: 1
                memory: 2Gi
              requests:
                cpu: 1
                memory: 2Gi
          - image: milroy1/kf-testing:amg-focal-openmpi-4.1.2-amd
            imagePullPolicy: Always
            name: worker
            lifecycle:
              postStart:
                exec:
                  command:
                  - bash
                  - -c
                  - "while ! bash -c \"</dev/tcp/localhost/22\" >/dev/null 2>&1; do sleep 0.1; done"
            command:
            - /usr/sbin/sshd
            args:
            - -De
            resources:
              limits:
                cpu: 1
                memory: 2Gi
              requests:
                cpu: 1
                memory: 2Gi
vsoch commented 2 years ago

Can you give me a high level understanding of how the two containers should be interacting? It looks like amg is starting an ssh server, and I'm guessing something from the flux container is supposed to be able to interact with it - how do I test that?

Update: I'm seeing that there is a set of "Worker" containers and a set of "Launcher" containers and they appear to be the same. I'm fairly far on adding to the Flux Operator but I would want to know the relationship between these two sets "Worker" and "Launcher." If this is specific to the MPI Operator, I'm wondering if the FluxOperator just needs one set of the containers in a pod with the postStart lifecycle still?

vsoch commented 1 year ago

This is technically done. We don't have good examples for the actual containers yet, but I'll work on this soon.