ThinkParQ / beegfs-csi-driver

The BeeGFS Container Storage Interface (CSI) driver provides high performing and scalable storage for workloads running in Kubernetes. 📦 🐝
Apache License 2.0
65 stars 18 forks source link

Feedback on Nomad CSI #7

Closed scaleoutsean closed 1 year ago

scaleoutsean commented 2 years ago
Apr 20 14:57:10 b5 nomad[6938]:     2022-04-20T14:57:10.809Z [ERROR] client.alloc_runner.task_runner.task_hook: killing task because plugin failed: alloc_id=e2f36449-9542-e65d-0769-6f8e15aa32c3 task=plugin error="CSI plugin failed probe: timeout while connecting to gRPC socket: failed to stat socket: stat /opt/nomad/data/client/csi/plugins/e2f36449-9542-e65d-0769-6f8e15aa32c3/csi.sock: no such file or directory"
ejweber commented 2 years ago

Thanks for taking a look @scaleoutsean!

scaleoutsean commented 2 years ago

That's helpful, thank you.

More to your point, however, the way we have plugin.nomad set up, omitting either csi-beegfs-config.yaml or csi-beegfs-connauth.yaml will cause a driver failure. The driver will expect to find at least a blank file at both paths.

I went with that assumption (and a few others for other steps, which made troubleshooting harder due to several assumptions at once) and I tried to leave connAuth empty, but later I also configured connection authentication as I was trying to see if the lack of it was causing my problems (it seemed it wasn't). Which brings me to this part of the page you linked above:

NOTE: beegfs-client.conf values MUST be specified as strings, even if they appear to be integers or booleans (e.g. "8000", not 8000 and "true", not true).

That seems to refer only to values in YAML files and doesn't apply to the secret string in the connAuth file (sample here) because that's part of a template file used for configuration and not an argument passed to beegfs-client binary.

But as I just discovered, that's not the case: surrounding values in plugin.nomad's docker template (connAuth, connUseRDMA) with double quotes seems to have helped and now the plugin works. So that may be a bug in reference plugin.nomad file. And indeed, volume.hcl (from the example) indicates that quotes should be used.

I still can't create a volume (getting Failed to load map file: /opt/nomad/data/client/csi/monolith/beegfs-plugin0/192.168.1.195_mnt_beegfs_nomad_VOLUME__NAME/beegfs-client.conf) but I just got to this step so I'll investigate this further.

ejweber commented 2 years ago

Interesting!

This commit added plugin.nomad on 08/05/21, which was tested in a Nomad environment we had set up at the time. plugin.nomad hasn't substantially change since then.

This commit changed the YAML parser the driver uses on 11/20/21. Both csi-beegfs-config.yaml and csi-beegfs-connauth.yaml are parsed in the same way, so the warning applies to both. When that commit went in, we updated all YAML files in the project, but it looks like we missed the Nomad templates (as you found).

As far as I understand (and remember) only values that might otherwise be interpreted as non-strings must be quoted. In the PR you submitted, it makes sense to me that "true" should be quoted. In my mind, this is what was causing the issue. I do not think "1.1.1.1" or "secret1" need to be quoted, and I would be curious whether the driver would run without.

scaleoutsean commented 2 years ago

You're right, it works with only the boolean value surrounded by quotes. But:

The details from bullet two are in another file not changed in my PR, so if you want to selectively surround boolean values with quotes that's fine, the PR allows "minor edits from maintainers" but if you make edits I would suggest to update the other example and that note (can be done in a separate PR by you) as well to minimize confusion for users.

Since I'm already editing this comment, I will add that yesterday I was thinking about making a suggestion to move /deploy and /examples into /docs and have everything under /docs publish-able to GH pages or elsewhere. I didn't mention that because it's not a big pain point now, but if additional changes are made to the docs, we could use the opportunity to do that.

ejweber commented 2 years ago

Any group of characters beginning with an alphabetic or numeric character in a YAML file is interpreted as a string unless it belongs to a special group (like integer, boolean, time, etc.). Additionally, any value can be forced to be interpreted as a string using double quotes. My intention with that beegfs-client.conf comment was to remind users to force special values in the config section to strings (because the Kubernetes YAML parser, for whatever reason, will error out instead of unmarshalling true to "true" in a map[string]string). I didn't intend it to mean, quote everything, but quotes around everything certainly doesn't hurt!

ejweber commented 2 years ago

We'd like to incorporate your PR into the upcoming 1.2.2 release. Our current process does not allow us to merge PRs directly on GitHub, but we can pull the commits in, test them in our infrastructure (which currently doesn't include a Nomad deployment, so it's just a formality), and include them when the release goes live. It'd be best if we did that on a fully working example though. Hopefully we can get to the bottom of the remaining issue. To that end, if there is additional commands or output you can share, I'd be happy to try and help troubleshoot.

scaleoutsean commented 2 years ago

Do you mean for the next step (volume create)? Sure, I haven't been able to figure that one out.

$ sudo beegfs-ctl --listnodes --nodetype=mgmt
b1 [ID: 1]

$ sudo beegfs-ctl --listnodes --nodetype=client
9E22-6260ECCA-b5 [ID: 1]

$ nslookup b1
Non-authoritative answer:
Name:   b1
Address: 192.168.1.191

$ nslookup b5
Non-authoritative answer:
Name:   b5
Address: 127.0.2.1
Name:   b5
Address: 192.168.1.195
job "beegfs-csi-plugin" {
  type = "system"
  datacenters = ["dc1"]
  group "csi" {
    task "plugin" {
      driver = "docker"
      template {
        data        = <<EOH
config:
  beegfsClientConf:
    connUseRDMA: "false"
        EOH
        destination = "${NOMAD_TASK_DIR}/csi-beegfs-config.yaml"
      }
      template {
        data        = <<EOH
- connAuth: secret
  sysMgmtdHost: 192.168.1.191
        EOH
        destination = "${NOMAD_SECRETS_DIR}/csi-beegfs-connauth.yaml"
      }
      config {
        mount {
          type     = "bind"
          target   = "/host"
          source   = "/"
          readonly = true
        }
        image = "netapp/beegfs-csi-driver:v1.2.1"
        args = [
          "--driver-name=beegfs.csi.netapp.com",
          "--client-conf-template-path=/host/etc/beegfs/beegfs-client.conf",
          "--cs-data-dir=/opt/nomad/data/client/csi/monolith/beegfs-plugin0",
          "--config-path=${NOMAD_TASK_DIR}/csi-beegfs-config.yaml",
          "--connauth-path=${NOMAD_SECRETS_DIR}/csi-beegfs-connauth.yaml",
          "--v=5",
          "--endpoint=unix://opt/nomad/data/client/csi/monolith/beegfs-plugin0/csi.sock",
          "--node-id=node-${NOMAD_ALLOC_INDEX}",
        ]
        privileged = true
      }
      csi_plugin {
        id = "beegfs-plugin0"
        type = "monolith"
        mount_dir = "/opt/nomad/data/client/csi/monolith/beegfs-plugin0"
      }
      resources {
        cpu = 256
        memory = 128
      }
    }
  }
}
$ nomad plugin status beegfs-plugin
ID                   = beegfs-plugin0
Provider             = beegfs.csi.netapp.com
Version              = v1.2.1-0-g316c1cd
Controllers Healthy  = 1
Controllers Expected = 1
Nodes Healthy        = 1
Nodes Expected       = 1

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created     Modified
ca0a455e  a987e631  csi         0        run      running  19m58s ago  19m52s ago
id = "VOLUME"
name = "VOLUME"
type = "csi"
plugin_id = "beegfs-plugin0"
capacity_min = "1MB"
capacity_max = "1GB"
capability {
  access_mode = "single-node-reader-only"

  attachment_mode = "file-system"
}
capability {
  access_mode = "single-node-writer"
  attachment_mode = "file-system"
}
parameters {
  sysMgmtdHost   = "192.168.1.191"
  volDirBasePath = "/mnt/beegfs/dyn"
}

[BeeGFS Control Tool Version: 7.3.0 Refer to the default config file (/etc/beegfs/beegfs-client.conf) or visit http://www.beegfs.com to find out about configuration options.]

: exit status 1

)


- When I check that path, the file does not exist, but maybe it is created and removed too quickly. 

$ dir -lat /mnt/beegfs/dyn/VOLUME dir: cannot access '/mnt/beegfs/dyn/VOLUME': No such file or directory

$ sudo dir -lat /opt/nomad/data/client/csi/monolith/beegfs-plugin0/ total 8 drwx------ 5 root root 4096 Apr 22 01:57 .. drwx------ 2 root root 4096 Apr 20 14:23 .


- Here's what I see in controller logs - the first row claims the config files are being written.

```I0422 16:10:53.071933       1 beegfs_util.go:62]  "msg"="Writing client files" "reqID"="009b" "path"="/opt/nomad/data/client/csi/monolith/beegfs-plugin0/192.168.1.191_mnt_beegfs_dyn_VOLUME" "volumeID"="beegfs://192.168.1.191/mnt/beegfs/dyn/VOLUME"
I0422 16:10:53.072271       1 beegfs_ctl.go:34]  "msg"="Creating BeeGFS directory" "reqID"="009b" "path"="/mnt/beegfs/dyn/VOLUME" "volumeID"="beegfs://192.168.1.191/mnt/beegfs/dyn/VOLUME"
I0422 16:10:53.072320       1 beegfs_ctl.go:138]  "msg"="Executing command" "reqID"="009b" "command"=["beegfs-ctl","--cfgFile=/opt/nomad/data/client/csi/monolith/beegfs-plugin0/192.168.1.191_mnt_beegfs_dyn_VOLUME/beegfs-client.conf","--unmounted","--getentryinfo","/mnt/beegfs/dyn/VOLUME"]
I0422 16:10:53.076350       1 beegfs_ctl.go:161]  "msg"="stderr from command" "reqID"="009b" "command"=["beegfs-ctl","--cfgFile=/opt/nomad/data/client/csi/monolith/beegfs-plugin0/192.168.1.191_mnt_beegfs_dyn_VOLUME/beegfs-client.conf","--unmounted","--getentryinfo","/mnt/beegfs/dyn/VOLUME"] "stderr"="\nError: Failed to load map file: /opt/nomad/data/client/csi/monolith/beegfs-plugin0/192.168.1.191_mnt_beegfs_dyn_VOLUME/beegfs-client.conf\n\n[BeeGFS Control Tool Version: 7.3.0\nRefer to the default config file (/etc/beegfs/beegfs-client.conf)\nor visit http://www.beegfs.com to find out about configuration options.]\n\n"
I0422 16:10:53.077025       1 beegfs_util.go:270]  "msg"="Unmounting volume from path" "reqID"="009b" "path"="/opt/nomad/data/client/csi/monolith/beegfs-plugin0/192.168.1.191_mnt_beegfs_dyn_VOLUME/mount" "volumeID"="beegfs://192.168.1.191/mnt/beegfs/dyn/VOLUME"
W0422 16:10:53.077259       1 mount_helper_common.go:33] Warning: Unmount skipped because path does not exist: /opt/nomad/data/client/csi/monolith/beegfs-plugin0/192.168.1.191_mnt_beegfs_dyn_VOLUME/mount
I0422 16:10:53.077409       1 beegfs_util.go:283]  "msg"="Cleaning up path" "reqID"="009b" "path"="/opt/nomad/data/client/csi/monolith/beegfs-plugin0/192.168.1.191_mnt_beegfs_dyn_VOLUME" "volumeID"="beegfs://192.168.1.191/mnt/beegfs/dyn/VOLUME"
E0422 16:10:53.077790       1 server.go:195]  "msg"="GRPC error" "error"="rpc error: code = Internal desc = beegfs-ctl failed with stdOut:  and stdErr: \nError: Failed to load map file: /opt/nomad/data/client/csi/monolith/beegfs-plugin0/192.168.1.191_mnt_beegfs_dyn_VOLUME/beegfs-client.conf\n\n[BeeGFS Control Tool Version: 7.3.0\nRefer to the default config file (/etc/beegfs/beegfs-client.conf)\nor visit http://www.beegfs.com to find out about configuration options.]\n\n: exit status 1: beegfs-ctl failed with stdOut:  and stdErr: \nError: Failed to load map file: /opt/nomad/data/client/csi/monolith/beegfs-plugin0/192.168.1.191_mnt_beegfs_dyn_VOLUME/beegfs-client.conf\n\n[BeeGFS Control Tool Version: 7.3.0\nRefer to the default config file (/etc/beegfs/beegfs-client.conf)\nor visit http://www.beegfs.com to find out about configuration options.]\n\n: exit status 1" "fullError"="exit status 1\nbeegfs-ctl failed with stdOut:  and stdErr: \nError: Failed to load map file: /opt/nomad/data/client/csi/monolith/beegfs-plugin0/192.168.1.191_mnt_beegfs_dyn_VOLUME/beegfs-client.conf\n\n[BeeGFS Control Tool Version: 7.3.0\nRefer to the default config file (/etc/beegfs/beegfs-client.conf)\nor visit http://www.beegfs.com to find out about configuration options.]\n\n\ngithub.com/netapp/beegfs-csi-driver/pkg/beegfs.(*beegfsCtlExecutor).execute\n\t/var/lib/jenkins/workspace/beegfs-csi-driver_master@2/pkg/beegfs/beegfs_ctl.go:154\ngithub.com/netapp/beegfs-csi-driver/pkg/beegfs.(*beegfsCtlExecutor).statDirectoryForVolume\n\t/var/lib/jenkins/workspace/beegfs-csi-driver_master@2/pkg/beegfs/beegfs_ctl.go:70\ngithub.com/netapp/beegfs-csi-driver/pkg/beegfs.(*beegfsCtlExecutor).createDirectoryForVolume\n\t/var/lib/jenkins/workspace/beegfs-csi-driver_master@2/pkg/beegfs/beegfs_ctl.go:36\ngithub.com/netapp/beegfs-csi-driver/pkg/beegfs.(*controllerServer).CreateVolume\n\t/var/lib/jenkins/workspace/beegfs-csi-driver_master@2/pkg/beegfs/controllerserver.go:139\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_CreateVolume_Handler.func1\n\t/var/lib/jenkins/workspace/beegfs-csi-driver_master@2/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi.pb.go:5676\ngithub.com/netapp/beegfs-csi-driver/pkg/beegfs.logGRPC\n\t/var/lib/jenkins/workspace/beegfs-csi-driver_master@2/pkg/beegfs/server.go:193\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_CreateVolume_Handler\n\t/var/lib/jenkins/workspace/beegfs-csi-driver_master@2/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi.pb.go:5678\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/var/lib/jenkins/workspace/beegfs-csi-driver_master@2/vendor/google.golang.org/grpc/server.go:1286\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/var/lib/jenkins/workspace/beegfs-csi-driver_master@2/vendor/google.golang.org/grpc/server.go:1609\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/var/lib/jenkins/workspace/beegfs-csi-driver_master@2/vendor/google.golang.org/grpc/server.go:934\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371\nrpc error: code = Internal desc = beegfs-ctl failed with stdOut:  and stdErr: \nError: Failed to load map file: /opt/nomad/data/client/csi/monolith/beegfs-plugin0/192.168.1.191_mnt_beegfs_dyn_VOLUME/beegfs-client.conf\n\n[BeeGFS Control Tool Version: 7.3.0\nRefer to the default config file (/etc/beegfs/beegfs-client.conf)\nor visit http://www.beegfs.com to find out about configuration options.]\n\n: exit status 1" "reqID"="009b" "method"="/csi.v1.Controller/CreateVolume" "request"="{\"accessibility_requirements\":{},\"capacity_range\":{\"limit_bytes\":1000000000,\"required_bytes\":1000000},\"name\":\"VOLUME\",\"parameters\":{\"sysMgmtdHost\":\"192.168.1.191\",\"volDirBasePath\":\"/mnt/beegfs/dyn\"},\"volume_capabilities\":[{\"AccessType\":{\"Mount\":{}},\"access_mode\":{\"mode\":2}},{\"AccessType\":{\"Mount\":{}},\"access_mode\":{\"mode\":1}}]}"

There's no log file about the config file being deleted, so I assume it should be there if it was created but that doesn't seem to be the case. I haven't looked at the source to see if deletions are logged as well - if they're not, maybe it'd be good to add that so that we know if the file was possibly created but quickly removed.

The volume workflow from this repo uses sed to search and replace volume name in the template file, but that doesn't work differently, it fails with the same error.

$ sed -e "s/VOLUME_NAME/sean[1]/" "volume.hcl" | nomad volume create -
Error creating volume: Unexpected response code: 500 (1 error occurred:
    * controller create volume: CSI.ControllerCreateVolume: controller plugin returned an internal error, check the plugin allocation logs for more information: rpc error: code = Internal desc = beegfs-ctl failed with stdOut:  and stdErr: 
Error: Failed to load map file: /opt/nomad/data/client/csi/monolith/beegfs-plugin0/192.168.1.191_mnt_beegfs_dyn_sean%5B1%5D/beegfs-client.conf

[BeeGFS Control Tool Version: 7.3.0
Refer to the default config file (/etc/beegfs/beegfs-client.conf)
or visit http://www.beegfs.com to find out about configuration options.]

: exit status 1

)

I've tried several different things, no luck.

I can't see anything important in BeeGFS Mgmt server logs - I assume that's because that config map can't be loaded so nothing gets sent its way when nomad volume create is executed.

scaleoutsean commented 2 years ago

Another feedback - it's a little hard to tell when plugin status is good by looking at the CLI output (plugin status). It seems Nomad shows the plugin as healthy as long as it's up and running.

In this particular case volume doesn't get created so we know there's something wrong, but until that point it seems hard to tell. For example, I can enter SysMgmtHost IP as 1.1.1.1 (edit: I mean in plugin.nomad) and the plugin will show as healthy in plugin status output. Thankfully it doesn't show in the Web UI under CSI > Plugins in that case so it seems there are some checks involved.

However, when I enter 192.168.1.195 (own IP of the Nomad Server / Nomad Client / BeeGFS Client), then the plugin does show in the Web UI. That may be due to its monolithic nature, but it also indicates that those health checks may not be reliable.

I wonder if a check command of some sort (maybe even just manually executed curl commands) could be used to test whether a temp volume can be created and deleted.

ejweber commented 2 years ago

For the "another feedback", it sounds like things are working as expected here. The configuration options provided to the driver in csi-beegfs-config.yaml csi-beegfs-connauth.yaml are used as needed. If a volume is created that references the file system with sysMgmtdHost 1.1.1.1, the driver will use the configuration associated with that file system. If no volume is ever created referencing the file system with sysMgmtdHost 1.1.1.1, the driver never uses that configuration, and simply having that configuration does not constitute an error. Of course, the reverse is also true (and this is one of the main reasons the driver was designed to work this way). You can specify absolutely no configuration and call out the sysMgmtdHos of no specific file systems and still create volumes referencing arbitrary file systems. As long as those file systems don't NEED special configuration, the driver can still mount them.

ejweber commented 2 years ago

The volume creation issues are difficult to troubleshoot without a working Nomad environment, but I can at least give you a bit of context.

The driver writes client configuration files to the configured directory directly (e.g. /opt/nomad/data/client/csi/monlith/beegfs-plugin0). This write happens in the driver's own mount namespace (within the driver container).

When the driver executes beegfs-ctl, it uses chroot to make this execution happen in the host namespace. This allows it to use the beegfs-ctl utility already installed on the host (instead of shipping one). Note that this is even more important for the node services NodeStageVolume command, as mount calls must reference a client configuration file in the host's namespace.

We take care in both Kubernetes and Nomad to ensure that the driver and the host both see the client configuration files as having the same path. That way, both the driver and the host can refer to them at that path. In Kubernetes, we accomplish this with a bind mount:

There is an implicit assumption in plugin.nomad that this /opt/nomad/data/client/csi/monolith/beegfs-plugin0 directory is similarly configured out of the box. And there is good evidence to suggest this is the case, as Nomad is writing to a socket in this directory and the driver is reading from the socket out of this directory.

That being said, the logs tell a different story. Since there is no error on directory creation, we can safely assume that the driver creates the configuration directory in its mount namespace. Since there is a failure on the part of beegfs-ctl to load the map file, I suspect that the host has a different view of what is or isn't contained in the same directory. This could be the result of some Nomad change since the version we tested with or an overlooked detail. (That "cleaning up path" message on log line 6 is an indicator that the client.conf file is blown away after the failure, so I wouldn't expect to be able to find it after the fact.)

scaleoutsean commented 2 years ago

If no volume is ever created referencing the file system with sysMgmtdHost 1.1.1.1, the driver never uses that configuration, and simply having that configuration does not constitute an error.

That's a good argument in favor of current approach. It also lets us configure CSI before storage is ready and leave the configuration in place during storage maintenance or downtime.

But I still wonder if at least some warning i(if not outright error status) should be emitted to alert the user can tell that plugin cannot access SysMgmt host. While in a large cluster there may always be some worker(s) that can't access SysMgmt IP, it would be useful to know that before the problem bubbles up to applications.

Since there is a failure on the part of beegfs-ctl to load the map file, I suspect that the host has a different view of what is or isn't contained in the same directory.

I'm also willing to consider I may have made some incorrect assumptions (as I mentioned, I'm not 100% sure I didn't, so I'm open to rechecking other details or offering additional information about my environment).

I looked the the bind thing and also this note which made me leave beegfs-client.conf in place, although my Ansible scripts deployed beegfs-client-b1.conf (b1 is hostname of SysMgmt worker). But I copied that file to beegfs-client.conf for BeeGFS CSI to load, so these have identical content.

Distroless-based containers don't have any shell, otherwise it'd be easy to get in and find out what's wrong. I had to implement a proven technique that I used in the 90's whenever Windows 3.1 had issues finding correct DLL files...

$ sudo mkdir -p /opt/nomad/data/client/csi/monolith/beegfs-plugin0/192.168.1.191_mnt_beegfs_dyn_VOLUME__NAME/
$ sudo cp /etc/beegfs/beegfs-client.conf /opt/nomad/data/client/csi/monolith/beegfs-plugin0/192.168.1.191_mnt_beegfs_dyn_VOLUME__NAME/

Then volume create worked.

$ nomad volume status VOLUME_NAME
ID                   = VOLUME_NAME
Name                 = VOLUME_NAME
External ID          = beegfs://192.168.1.191/mnt/beegfs/dyn/VOLUME_NAME
Plugin ID            = beegfs-plugin0
Provider             = beegfs.csi.netapp.com
Version              = v1.2.1-0-g316c1cd
Schedulable          = true
Controllers Healthy  = 1
Controllers Expected = 1
Nodes Healthy        = 1
Nodes Expected       = 1
Access Mode          = <none>
Attachment Mode      = <none>
Mount Options        = <none>
Namespace            = default

Allocations
No allocations placed

This isn't how it's supposed to work, but I may be able to continue testing other things.

Note that Access Mode and other stuff in volume status output is missing, probably due to the hackish workaround. Also, volume remove doesn't work (I have to use volume unregister). I haven't tried to actually use it from a container yet, so who knows if that works or not. Edit: all right, it doesn't... But csi_hook sure looks for some funny path names.

failed to setup alloc: pre-run hook "csi_hook" failed: node plugin returned an internal error, check the plugin allocation logs for more information: rpc error: code = Internal desc = beegfs-ctl failed with stdOut: and stdErr: Error: Failed to load map file: /local/csi/staging/VOLUME_NAME/ro-file-system-single-node-reader-only/beegfs-client.conf [BeeGFS Control Tool Version: 7.3.0 Refer to the default config file (/etc/beegfs/beegfs-client.conf) or visit http://www.beegfs.com to find out about configuration options.] : exit status 1
ejweber commented 2 years ago

If you could run the plugin.nomad, then capture a docker inspect on the running plugin container, that'd give us information on exactly what directories are and aren't being shared between the host and container. It's looking like there may be a decent dev lift to rework the way our Nomad examples handle paths (either due to changes in Nomad, Nomad's CSI support, or the driver itself). The BeeGFS CSI driver is somewhat unique in the way it is picky about paths matching inside and outside it's container. I wouldn't expect most drivers to be that particular (because most drivers don't execute a mount command that requires host namespace access to a file they have written).

scaleoutsean commented 2 years ago

Sure!

job "beegfs" {
  type = "system"
  datacenters = ["dc1-f2"]
  group "csi" {
    task "plugin" {
      driver = "docker"
      template {
        data        = <<EOH
config:
  beegfsClientConf:
    connUseRDMA: "false"
        EOH
        destination = "${NOMAD_TASK_DIR}/csi-beegfs-config.yaml"
      }
      template {
        data        = <<EOH
- connAuth: secret
  sysMgmtdHost: 192.168.1.191
        EOH
        destination = "${NOMAD_SECRETS_DIR}/csi-beegfs-connauth.yaml"
      }
      config {
        mount {
          type     = "bind"
          target   = "/host"
          source   = "/"
          readonly = true
        }
        image = "netapp/beegfs-csi-driver:v1.2.1"
        args = [
          "--driver-name=beegfs.csi.netapp.com",
          "--client-conf-template-path=/host/etc/beegfs/beegfs-client.conf",
          "--cs-data-dir=/opt/nomad/data/client/csi/monolith/beegfs-plugin0",
          "--config-path=${NOMAD_TASK_DIR}/csi-beegfs-config.yaml",
          "--connauth-path=${NOMAD_SECRETS_DIR}/csi-beegfs-connauth.yaml",
          "--v=5",
          "--endpoint=unix://opt/nomad/data/client/csi/monolith/beegfs-plugin0/csi.sock",
          "--node-id=node-${NOMAD_ALLOC_INDEX}",
        ]
        privileged = true
      }
      csi_plugin {
        id = "beegfs-plugin0"
        type = "monolith"
        mount_dir = "/opt/nomad/data/client/csi/monolith/beegfs-plugin0"
      }
      resources {
        cpu = 256
        memory = 128
      }
    }
  }
}
ID                  = 1aab2b04-1ecb-754f-3f7a-dcd7cfc3a85e
Eval ID             = d40a547b
Name                = beegfs.csi[0]
Node ID             = 4f3d8916
Node Name           = b5
Job ID              = beegfs
Job Version         = 0
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 10h31m ago
Modified            = 10h31m ago

Task "plugin" is "running"
Task Resources
CPU        Memory           Disk     Addresses
0/256 MHz  6.5 MiB/128 MiB  300 MiB  

Task Events:
Started At     = 2022-04-26T04:43:11Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type                   Description
2022-04-26T04:43:11Z  Plugin became healthy  plugin: beegfs-plugin0
2022-04-26T04:43:11Z  Started                Task started by client
2022-04-26T04:43:11Z  Task Setup             Building Task Directory
2022-04-26T04:43:11Z  Received               Task received by client
$ docker ps -a
CONTAINER ID   IMAGE                             COMMAND                  CREATED        STATUS        PORTS     NAMES
7f85adbb6b0b   netapp/beegfs-csi-driver:v1.2.1   "beegfs-csi-driver -…"   11 hours ago   Up 11 hours             plugin-1aab2b04-1ecb-754f-3f7a-dcd7cfc3a85e
[
    {
        "Id": "7f85adbb6b0b453d5d3e888889991ec528baa120afdb4b0b3a60f02b59324a6c",
        "Created": "2022-04-26T04:43:11.528855254Z",
        "Path": "beegfs-csi-driver",
        "Args": [
            "--driver-name=beegfs.csi.netapp.com",
            "--client-conf-template-path=/host/etc/beegfs/beegfs-client.conf",
            "--cs-data-dir=/opt/nomad/data/client/csi/monolith/beegfs-plugin0",
            "--config-path=/local/csi-beegfs-config.yaml",
            "--connauth-path=/secrets/csi-beegfs-connauth.yaml",
            "--v=5",
            "--endpoint=unix://opt/nomad/data/client/csi/monolith/beegfs-plugin0/csi.sock",
            "--node-id=node-0"
        ],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 3743,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2022-04-26T04:43:11.742808522Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
        },
        "Image": "sha256:a8414e83431d0ca80b8db3aae569bc1497b6b059b33577f2f83e4caecc076361",
        "ResolvConfPath": "/var/lib/docker/containers/7f85adbb6b0b453d5d3e888889991ec528baa120afdb4b0b3a60f02b59324a6c/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/7f85adbb6b0b453d5d3e888889991ec528baa120afdb4b0b3a60f02b59324a6c/hostname",
        "HostsPath": "/var/lib/docker/containers/7f85adbb6b0b453d5d3e888889991ec528baa120afdb4b0b3a60f02b59324a6c/hosts",
        "LogPath": "/var/lib/docker/containers/7f85adbb6b0b453d5d3e888889991ec528baa120afdb4b0b3a60f02b59324a6c/7f85adbb6b0b453d5d3e888889991ec528baa120afdb4b0b3a60f02b59324a6c-json.log",
        "Name": "/plugin-1aab2b04-1ecb-754f-3f7a-dcd7cfc3a85e",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "unconfined",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [
                "/opt/nomad/data/alloc/1aab2b04-1ecb-754f-3f7a-dcd7cfc3a85e/alloc:/alloc",
                "/opt/nomad/data/alloc/1aab2b04-1ecb-754f-3f7a-dcd7cfc3a85e/plugin/local:/local",
                "/opt/nomad/data/alloc/1aab2b04-1ecb-754f-3f7a-dcd7cfc3a85e/plugin/secrets:/secrets"
            ],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {
                    "max-file": "2",
                    "max-size": "2m"
                }
            },
            "NetworkMode": "default",
            "PortBindings": null,
            "RestartPolicy": {
                "Name": "",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": null,
            "CapDrop": null,
            "CgroupnsMode": "host",
            "Dns": null,
            "DnsOptions": null,
            "DnsSearch": null,
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "private",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": true,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": [
                "label=disable"
            ],
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 256,
            "Memory": 134217728,
            "NanoCpus": 0,
            "CgroupParent": "cpuset",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": null,
            "DeviceCgroupRules": null,
            "DeviceRequests": null,
            "KernelMemory": 0,
            "KernelMemoryTCP": 0,
            "MemoryReservation": 0,
            "MemorySwap": -1,
            "MemorySwappiness": 0,
            "OomKillDisable": false,
            "PidsLimit": null,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0,
            "Mounts": [
                {
                    "Type": "bind",
                    "Source": "/",
                    "Target": "/host",
                    "ReadOnly": true,
                    "BindOptions": {}
                },
                {
                    "Type": "bind",
                    "Source": "/opt/nomad/data/client/csi/plugins/1aab2b04-1ecb-754f-3f7a-dcd7cfc3a85e",
                    "Target": "/opt/nomad/data/client/csi/monolith/beegfs-plugin0",
                    "BindOptions": {
                        "Propagation": "rshared"
                    }
                },
                {
                    "Type": "bind",
                    "Source": "/opt/nomad/data/client/csi/monolith/beegfs-plugin0",
                    "Target": "/local/csi",
                    "BindOptions": {
                        "Propagation": "rshared"
                    }
                },
                {
                    "Type": "bind",
                    "Source": "/dev",
                    "Target": "/dev",
                    "BindOptions": {
                        "Propagation": "rprivate"
                    }
                }
            ],
            "MaskedPaths": null,
            "ReadonlyPaths": null
        },
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/cba60653c4ce6743570250e571dce17dbe302dfe849a4de9101834dab4ff846e-init/diff:/var/lib/docker/overlay2/3414806ddbd29b6a4d3f8541a00deaff76bc8b66bf28f6cb92fecc65f216abad/diff:/var/lib/docker/overlay2/b276823b9b0577260de440c35fc6fbad7b060452ae5374804bc713794de3c10d/diff:/var/lib/docker/overlay2/4974a8f2a57e177f41c053ef1af77f17fa045d8535d8119ed252c17c3034145e/diff",
                "MergedDir": "/var/lib/docker/overlay2/cba60653c4ce6743570250e571dce17dbe302dfe849a4de9101834dab4ff846e/merged",
                "UpperDir": "/var/lib/docker/overlay2/cba60653c4ce6743570250e571dce17dbe302dfe849a4de9101834dab4ff846e/diff",
                "WorkDir": "/var/lib/docker/overlay2/cba60653c4ce6743570250e571dce17dbe302dfe849a4de9101834dab4ff846e/work"
            },
            "Name": "overlay2"
        },
        "Mounts": [
            {
                "Type": "bind",
                "Source": "/dev",
                "Destination": "/dev",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/opt/nomad/data/alloc/1aab2b04-1ecb-754f-3f7a-dcd7cfc3a85e/alloc",
                "Destination": "/alloc",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/opt/nomad/data/alloc/1aab2b04-1ecb-754f-3f7a-dcd7cfc3a85e/plugin/local",
                "Destination": "/local",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/opt/nomad/data/alloc/1aab2b04-1ecb-754f-3f7a-dcd7cfc3a85e/plugin/secrets",
                "Destination": "/secrets",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/",
                "Destination": "/host",
                "Mode": "",
                "RW": false,
                "Propagation": "rslave"
            },
            {
                "Type": "bind",
                "Source": "/opt/nomad/data/client/csi/plugins/1aab2b04-1ecb-754f-3f7a-dcd7cfc3a85e",
                "Destination": "/opt/nomad/data/client/csi/monolith/beegfs-plugin0",
                "Mode": "",
                "RW": true,
                "Propagation": "rshared"
            },
            {
                "Type": "bind",
                "Source": "/opt/nomad/data/client/csi/monolith/beegfs-plugin0",
                "Destination": "/local/csi",
                "Mode": "",
                "RW": true,
                "Propagation": "rshared"
            }
        ],
        "Config": {
            "Hostname": "7f85adbb6b0b",
            "Domainname": "",
            "User": "0",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "CSI_ENDPOINT=unix:///opt/nomad/data/client/csi/monolith/beegfs-plugin0/csi.sock",
                "NOMAD_ALLOC_DIR=/alloc",
                "NOMAD_ALLOC_ID=1aab2b04-1ecb-754f-3f7a-dcd7cfc3a85e",
                "NOMAD_ALLOC_INDEX=0",
                "NOMAD_ALLOC_NAME=beegfs.csi[0]",
                "NOMAD_CPU_LIMIT=256",
                "NOMAD_DC=dc1-f2",
                "NOMAD_GROUP_NAME=csi",
                "NOMAD_JOB_ID=beegfs",
                "NOMAD_JOB_NAME=beegfs",
                "NOMAD_MEMORY_LIMIT=128",
                "NOMAD_NAMESPACE=default",
                "NOMAD_PARENT_CGROUP=/nomad",
                "NOMAD_REGION=global",
                "NOMAD_SECRETS_DIR=/secrets",
                "NOMAD_TASK_DIR=/local",
                "NOMAD_TASK_NAME=plugin",
                "PATH=/netapp://usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt"
            ],
            "Cmd": [
                "--driver-name=beegfs.csi.netapp.com",
                "--client-conf-template-path=/host/etc/beegfs/beegfs-client.conf",
                "--cs-data-dir=/opt/nomad/data/client/csi/monolith/beegfs-plugin0",
                "--config-path=/local/csi-beegfs-config.yaml",
                "--connauth-path=/secrets/csi-beegfs-connauth.yaml",
                "--v=5",
                "--endpoint=unix://opt/nomad/data/client/csi/monolith/beegfs-plugin0/csi.sock",
                "--node-id=node-0"
            ],
            "Image": "netapp/beegfs-csi-driver:v1.2.1",
            "Volumes": null,
            "WorkingDir": "/",
            "Entrypoint": [
                "beegfs-csi-driver"
            ],
            "OnBuild": null,
            "Labels": {
                "com.hashicorp.nomad.alloc_id": "1aab2b04-1ecb-754f-3f7a-dcd7cfc3a85e",
                "description": "BeeGFS CSI Driver",
                "maintainers": "NetApp",
                "revision": "v1.2.1-0-g316c1cd"
            }
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "95252430e6b58cd1788ecf80a737109d95249674b0455d64637640bfea105259",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {},
            "SandboxKey": "/var/run/docker/netns/95252430e6b5",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "b088a0b0a4a011f0c9c2eb88b8ff37e288956b139e37f512bb45b1c6d8a19f26",
            "Gateway": "172.17.0.1",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "172.17.0.2",
            "IPPrefixLen": 16,
            "IPv6Gateway": "",
            "MacAddress": "02:42:ac:11:00:02",
            "Networks": {
                "bridge": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "5be822a1bfacba34251a8d16e1c47102f0c9c9bfde84112b0652ae1e2bf5ba4a",
                    "EndpointID": "b088a0b0a4a011f0c9c2eb88b8ff37e288956b139e37f512bb45b1c6d8a19f26",
                    "Gateway": "172.17.0.1",
                    "IPAddress": "172.17.0.2",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:ac:11:00:02",
                    "DriverOpts": null
                }
            }
        }
    }
]
ejweber commented 2 years ago

I am actively reworking our Nomad support now (though only in my spare time for the moment). I have a Nomad cluster up and have started to work through the issues you experienced. As best as I can tell, there HAVE been changes to the way Nomad handles CSI paths since our original implementation, and, as I guessed above, the unique need for our driver and the host to agree on the full path to configuration files is causing problems.

I fixed your initial CreateVolume issue with a new bind mount (/opt/nomad/client/csi/monolith/beegfs-plugin0:/opt/nomad/client/csi/monolith/beegfs-plugin0) that allows the controller service (running in the container) and beegfs-ctl/mount (outside the container) to agree on the location of beegfs-client.conf. There were some additional code changes required to make this workable, so I don't recommend trying it with a v1.2.2 container.

The node service issue you ran into is a bigger challenge. Nomad is now bind mounting /opt/nomad/client/csi/monolith/beegfs-plugin0:/local/csi automatically for CSI drivers and providing staging_target_paths in NodeStateVolume like /local/csi/.... (For what it's worth, Kubernetes provides absolute staging_target_paths like /var/lib/kubelet/plugins/beegfs.csi.netapp.com, which are much easier for us to deal with). Many drivers don't care, as all userspace utilities run inside the driver container and have a synchronized view of the file system. However, we choose not to package beegfs-ctl and the core utilities in our container.

We're talking through fixes, but we may need something like an additional command line argument that helps the driver modify the staging_target_paths it is provided before attempting to mount.

ejweber commented 2 years ago

We have identified a twofold path to get Nomad working again.

  1. We created https://github.com/hashicorp/nomad/issues/13263 and https://github.com/hashicorp/nomad/pull/13919 to make it possible for the driver (as it exists today) to deal with the staging_target_paths an target_paths Nomad provides. It's not clear if/when these changes will be incorporated into Nomad.
  2. We are completely reworking the Nomad manifests to make use of the proposed Nomad changes (and just work better in general). These changes will be in the next version (v1.3.0) of the driver, but they will only work with our internal builds of Nomad until our proposed changes to Nomad are incorporated.
ejweber commented 1 year ago

The changes we introduced in v1.3.0 addressed the known Nomad issues. Additional cleanup is being done in v1.4.0. I'm going to go ahead and close this one for now, but please feel free to open a followup issue with additional feedback if/when it makes sense.