ai4os / ai4-ansible

Ansible roles to deploy the federated nomad cluster
Apache License 2.0
0 stars 0 forks source link

Docker Daemon Configuration Not Loading Correctly #41

Open Sftobias opened 1 month ago

Sftobias commented 1 month ago

Issue: Docker Daemon Configuration Not Loading Correctly

Description: The Ansible role for configuring the Docker daemon does not load the expected configuration correctly. Instead of the desired configuration, the following is being applied:

{
  "default-runtime": "nvidia",
  "exec-opts": [
    "native.cgroupdriver=cgroupfs"
  ],
  "runtimes": {
    "nvidia": {
      "path": "/usr/bin/nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

However, the expected configuration should include additional parameters such as data-root and registry-mirrors. The correct configuration should look like this:

{
  "data-root": "/mnt/data",
  "default-runtime": "nvidia",
  "exec-opts": [
    "native.cgroupdriver=cgroupfs"
  ],
  "registry-mirrors": [
    "https://repo.ifca.es:5000"
  ],
  "runtimes": {
    "nvidia": {
      "path": "/usr/bin/nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

Expected behavior: The Ansible role should configure the Docker daemon with the complete expected settings, including the data-root and registry-mirrors fields.

Steps to reproduce:

  1. Run the Ansible roles.
  2. Check the resulting /etc/docker/daemon.json configuration file.
  3. Compare the loaded configuration with the expected one.

Possible cause: Ansible roles are aplying the default docker configuration, or are not including all the expected parameters.

Environment:

Additional context:

micafer commented 1 month ago

Hi @Sftobias The "data-root": "/mnt/data", option should be added just in case of the "nomad_volume" nodes not in all the cases, right?

Sftobias commented 1 month ago

Thats correct, in other words should be added on clients (every client has mounted a volume afaik). But that options are not applying even if there is a volume defined, i got problems adding some GPU nodes.

micafer commented 1 month ago

and the "registry-mirrors" is also needed in all the "nomad_clients" or only in the volume ones?

Sftobias commented 1 month ago

I think that should be on every client that uses docker, so it should always be present.