Azure / acs-engine

WE HAVE MOVED: Please join us at Azure/aks-engine!
https://github.com/Azure/aks-engine
MIT License
1.03k stars 560 forks source link

DC/OS - Unable to create external volume #569

Closed blucas closed 7 years ago

blucas commented 7 years ago

I'm unable to deploy a containerized app that uses an external volume using the DC/OS (1.9) Orchestrator on Azure. According to the error message the rexray plugin is not installed. AFAIK DC/OS is supposed to come with RexRay installed as you can see in the component diagram below: https://dcos.io/docs/1.9/overview/architecture/components/ Error:

I0503 11:38:24.925202 33890 exec.cpp:162] Version: 1.2.1
I0503 11:38:24.930164 33901 exec.cpp:237] Executor registered on agent d20c1526-e87d-4903-ab43-7d74510faf5d-S0
docker: Error response from daemon: create nginx-data-vol: create nginx-data-vol: Error looking up volume plugin rexray: plugin not found.
See 'docker run --help'.
W0503 11:38:24.930164 33890 logging.cpp:91] RAW: Received signal SIGTERM from process 4688 of user 0; exiting

Marathon JSON Template:

{
  "id": "/nginx",
  "cmd": null,
  "cpus": 0.5,
  "mem": 128,
  "disk": 0,
  "instances": 1,
  "acceptedResourceRoles": [
    "slave_public"
  ],
  "container": {
    "type": "DOCKER",
    "volumes": [
      {
        "containerPath": "/usr/share/nginx/html",
        "mode": "RW",
        "external": {
          "name": "nginx-data-vol",
          "provider": "dvdi",
          "options": {
            "dvdi/driver": "rexray"
          }
        }
      },
      {
        "containerPath": "/etc/nginx",
        "mode": "RW",
        "external": {
          "name": "nginx-conf-vol",
          "provider": "dvdi",
          "options": {
            "dvdi/driver": "rexray"
          }
        }
      }
    ],
    "docker": {
      "image": "nginx",
      "network": "BRIDGE",
      "portMappings": [
        {
          "containerPort": 80,
          "hostPort": 0,
          "servicePort": 10151,
          "protocol": "tcp",
          "name": "http",
          "labels": {}
        }
      ],
      "privileged": false,
      "parameters": [],
      "forcePullImage": false
    }
  },
  "healthChecks": [
    {
      "gracePeriodSeconds": 300,
      "intervalSeconds": 60,
      "timeoutSeconds": 20,
      "maxConsecutiveFailures": 3,
      "portIndex": 0,
      "path": "/",
      "protocol": "MESOS_HTTP",
      "delaySeconds": 15
    }
  ],
  "labels": {
    "HAPROXY_GROUP": "external",
    "HAPROXY_0_VHOST": "mysite.com",
    "HAPROXY_0_MODE": "http",
    "HAPROXY_0_ENABLED": "true"
  },
  "portDefinitions": [
    {
      "port": 10151,
      "protocol": "tcp",
      "name": "default",
      "labels": {}
    }
  ],
  "upgradeStrategy": {
    "minimumHealthCapacity": 0.5,
    "maximumOverCapacity": 0
  }
}
JackQuincy commented 7 years ago

Strange. Our cloud init is identical to what dcos/dcos has for setting this up. In the list of packages we have "rexray--869621bb411c9f2a793ea42cdfeed489e1972aaa", and then we have the config for it like this - content: | rexray: loglevel: info modules: default-admin: host: tcp://127.0.0.1:61003 default-docker: disabled: true path: /etc/rexray/config.yml permissions: '0644'

So I'm surprised this isn't working. @xtophs Do you know if we have something misconfigured or did @blucas miss a step?

JackQuincy commented 7 years ago

@rgardler realized you might have context here too.

JackQuincy commented 7 years ago

@wbuchwalter Do you know what is happening here?

blucas commented 7 years ago

An update for you guys. I decided to try this out on a vanilla installation of DC/OS (1.8) using ACS not acs-engine (e.g. using portal.azure.com to create DC/OS). Turns out that I get the same error on there as well. It also seems that their documentation (for 1.8) states that RexRay does not support Azure, But 1.9 has removed that statement.

I think the main thing here is that the default DC/OS installation does not enable the rexray plugin.

Do you guys think this is something you can speak to Mesosphere about getting corrected? RexRay supports Azure UnmanagedDisk for storage, since Feb 2017.

Error:

(AT BEGINNING OF FILE)
I0504 08:43:19.954798  7609 exec.cpp:161] Version: 1.0.3
I0504 08:43:19.956876  7624 exec.cpp:236] Executor registered on agent 0c5a270b-1f3e-49e4-af4e-d65d9c8e619d-S0
I0504 08:43:19.958153  7622 docker.cpp:815] Running docker -H unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 -e MARATHON_APP_VERSION=2017-05-04T08:25:42.519Z -e HOST=10.0.0.4 -e MARATHON_APP_RESOURCE_CPUS=1.0 -e MARATHON_APP_RESOURCE_GPUS=0 -e MARATHON_APP_DOCKER_IMAGE=nginx -e MESOS_TASK_ID=nginx.b978376e-30a5-11e7-b6a1-024206bfed62 -e PORT=27577 -e MARATHON_APP_RESOURCE_MEM=128.0 -e PORTS=27577 -e PORT_HTTP=27577 -e MARATHON_APP_RESOURCE_DISK=0.0 -e PORT_80=27577 -e MARATHON_APP_LABELS= -e MARATHON_APP_ID=/nginx -e PORT0=27577 -e LIBPROCESS_IP=10.0.0.4 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-0c5a270b-1f3e-49e4-af4e-d65d9c8e619d-S0.fbc7ddf5-d6cb-4a02-81f4-f39ed88ba0fd -v conf:/etc/nginx:rw -v /var/lib/mesos/slave/slaves/0c5a270b-1f3e-49e4-af4e-d65d9c8e619d-S0/frameworks/0c5a270b-1f3e-49e4-af4e-d65d9c8e619d-0000/executors/nginx.b978376e-30a5-11e7-b6a1-024206bfed62/runs/fbc7ddf5-d6cb-4a02-81f4-f39ed88ba0fd:/mnt/mesos/sandbox --volume-driver=rexray --net bridge -p 27577:80/tcp --name mesos-0c5a270b-1f3e-49e4-af4e-d65d9c8e619d-S0.fbc7ddf5-d6cb-4a02-81f4-f39ed88ba0fd nginx
docker: Error response from daemon: create conf: create conf: Error looking up volume plugin rexray: plugin not found.
See 'docker run --help'.
W0504 08:43:19.958153  7609 logging.cpp:91] RAW: Received signal SIGTERM from process 26415 of user 0; exiting
docker: Error response from daemon: create conf: create conf: Error looking up volume plugin rexray: plugin not found.
See 'docker run --help'.
W0504 08:43:19.958153  7609 logging.cpp:91] RAW: Received signal SIGTERM from process 26415 of user 0; exiting

Marathon JSON Config:

{
  "id": "/kdhsfiuaydhsf",
  "cmd": null,
  "cpus": 1,
  "mem": 128,
  "disk": 0,
  "instances": 1,
  "executor": null,
  "fetch": null,
  "constraints": null,
  "acceptedResourceRoles": null,
  "user": null,
  "container": {
    "type": "MESOS",
    "volumes": [
      {
        "containerPath": "/etc/nginx",
        "external": {
          "name": "my-vol",
          "provider": "dvdi",
          "options": {
            "dvdi/driver": "rexray"
          }
        },
        "mode": "RW"
      }
    ]
  },
  "labels": null,
  "healthChecks": null,
  "env": null,
  "updateStrategy": {
    "maximumOverCapacity": 0,
    "minimumHealthCapacity": 0
  },
  "portDefinitions": [
    {
      "protocol": "tcp",
      "port": 0
    }
  ]
}
JackQuincy commented 7 years ago

So I asked on the DCOS slack and rexray 0.8.0 supports azure but they are only on 0.4.0. They also had issues with the fact we normally deploy on VMSS and you can't attach a disk to a single node in VMSS. So this is a known issue and would be a feature update to support it. And it would only work on VMAS if we did the update

Edit: correcting version info I got the wrong version. I wrote the libstorage version not the rexray version

anhowe commented 7 years ago

Closing since this is supported using VMAS (availability sets) and rexray 0.8.0. An example of availability sets is here: https://github.com/Azure/acs-engine/blob/master/examples/largeclusters/dcos-vmas.json