dnephin / dobi

A build automation tool for Docker applications
https://dnephin.github.io/dobi/
Apache License 2.0
309 stars 36 forks source link

Support jobs with DeviceRequests #231

Closed sih4sing5hong5 closed 9 months ago

sih4sing5hong5 commented 10 months ago

Background

The NVIDIA Container Toolkit enables users to build and run GPU-accelerated containers.

Adding --gpus all if using docker command, eg:

docker run --rm --gpus all ubuntu nvidia-smi

Or adding devices in deploy of docker-compose.yml if using docker-compose, the yaml eg:

services:
  test:
    image: nvidia/cuda:10.2-base
    command: nvidia-smi
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Expectation

Support dobi job running with gpus by adding configure.

How to

I'm not familiar with golang. Thank dnephin for providing where the configure of dobi is in https://github.com/dnephin/dobi/issues/169#issuecomment-562880886. I surveyed the fsouza/go-dockerclient, and got the DeviceRequests parameter!!

I think the solution is adding DeviceRequests in HostConfig of run.go https://github.com/dnephin/dobi/blob/76e10c8c263f6b4903c122a4b5f84cacc373804b/tasks/job/run.go#L277-L282

Questions

How to design the usage of DeviceRequests in dobi.yaml

I'm not sure the below one is good design:

job=my-job:
    use: my-image
    devices:
        - driver: nvidia
          count: all
          capabilities: [gpu]

How to build the dobi

I followed the docs, and tried build dobi myself. Got errors:

$ docker run -ti --rm -w $(pwd) -v $(pwd):$(pwd) -e DOCKER_HOST     -v /var/run/docker.sock:/var/run/docker.sock     dnephin/dobi:0.13.0 deps binary

...

Step 8/11 : RUN     go get -d github.com/golang/mock/mockgen &&         cd /go/src/github.com/golang/mock &&         git checkout -q "v1.0.0" &&         go build -v -o /usr/local/bin/mockgen ./mockgen &&         rm -rf /go/src/* /go/pkg/* /go/bin/*
 ---> Running in 09069f4c0946
package io/fs: unrecognized import path "io/fs" (import path does not begin with hostname)
[ERROR] failed to execute task "builder:build": The command '/bin/sh -c go get -d github.com/golang/mock/mockgen &&         cd /go/src/github.com/golang/mock &&         git checkout -q "v1.0.0" &&         go build -v -o /usr/local/bin/mockgen ./mockgen &&         rm -rf /go/src/* /go/pkg/* /go/bin/*' returned a non-zero code: 1

May I make a PR for this function?

dobi is a great tool and useful for teaching maching learning! If you feel OK, I will try to make a PR to contribute :smiley:

dnephin commented 10 months ago

Hello, if you make a PR for this I can review it. It shouldn't be a big change hopefully.

I think the build error is because the Dockerfiles in https://github.com/dnephin/dobi/tree/main/dockerfiles are still using old versions of Go and alpine. They probably need to be updated to Go1.20

sih4sing5hong5 commented 10 months ago

Hello, if you make a PR for this I can review it. It shouldn't be a big change hopefully.

I think the build error is because the Dockerfiles in https://github.com/dnephin/dobi/tree/main/dockerfiles are still using old versions of Go and alpine. They probably need to be updated to Go1.20

Thank you, I'll try it!

sih4sing5hong5 commented 9 months ago

I ran gpu directly by compose. Thank you, dnephin :)