Closed johnnyplaydrums closed 2 years ago
considering that Nomad supports standard CSI interfaces and claims that the Kubernetes CSI plugins work out of the box, I don't see any reason why it wouldn't work. The bigger question is how to configure the plugin to work as intended.
Hi @ressu, thank you for the response. I agree that the varies documentation claims that synology-csi should work with Nomad, but I'm not sure if that is true or not. Have you heard of anyone successfully using synology-csi with Nomad? I'd love to learn from their experience. When I tried to deploy this synology-csi container into Nomad, I got the error:
[FATAL] [driver/grpc.go:91] Failed to listen: listen unix //var/lib/kubelet/plugins/csi.san.synology.com/csi.sock: bind: no such file or directory
As you can see that's pointing to a kubernetes-specific endpoint /var/lib/kubelet/
. It seems that this plugin has a lot of hard-coded kubernets configuration that doesn't look like it can be configured otherwise, for example the csiEndpoint mentioned above has kubelet
hard-coded into it: https://github.com/SynologyOpenSource/synology-csi/blob/dc05a795b79b911ec5882c3c837a7779cf3576a8/main.go#L23
I am new to csi plugins so maybe I just need to learn more about them in order to properly configure synology-csi to work with Nomad. But from what I can tell it doesn't seem like it will work, what do you think?
unfortunately I don't know how Nomad invokes the CSI daemons, which would give me a better idea on how to solve this. The path for the csi socket can be overridden with a flag -e
or --endpoint
as seen here https://github.com/SynologyOpenSource/synology-csi/blob/dc05a795b79b911ec5882c3c837a7779cf3576a8/main.go#L98
many of the default features are mainly handled by the generic csi containers, so that would change the situation a bit too.
That being said, if you can adjust the startup of the csi plugin and add --endpoint=/csi/csi.sock
to the startup, you might be able to get something going
Awesome, that's super helpful @ressu! I'll close this for now. If anyone else has advice on properly configuring this plugin to work with Nomad, please reach out :)
Hey @ressu, another question for you. I was able to make some progress and get the synology-csi plugin running in Nomad by setting --endpoint=unix:///csi/csi.sock
. I've registered the Synology volume and was trying to deploy a job using that volume when I got the follow error:
2022-01-05T21:14:44Z [INFO] [driver/utils.go:104] GRPC call: /csi.v1.Node/NodeStageVolume
2022-01-05T21:14:44Z [INFO] [driver/utils.go:105] GRPC request: {"staging_target_path":"/csi/staging/scada-test/ro-file-system-single-node-reader-only","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4","mount_flags":["noatime"]}},"access_mode":{"mode":2}},"volume_id":"1"}
2022-01-05T21:14:44Z [ERROR] [driver/utils.go:108] GRPC error: rpc error: code = Internal desc = rpc error: code = NotFound desc = Volume[1] is not found
It's unable to find the volume on our Synology DSM: Volume[1] is not found
. I also tried registering the volume as "Volume 1"
, "/volume1"
, and combinations like that, but no luck. Our Synology device just has 1 volume called Volume 1
in the DSM dashboard. I'm not sure what the volume_id
is supposed to be. Do you know how I can find what the volume_id
is for our Synology volume?
I checked my logs and it seems that the volume_id
is the UUID of the LUN:
2021-12-23T23:18:29Z [INFO] [driver/utils.go:104] GRPC call: /csi.v1.Node/NodeStageVolume
2021-12-23T23:18:29Z [INFO] [driver/utils.go:105] GRPC request: {"staging_target_path":"/var/snap/microk8s/common/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pod-config/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"dsm":"10.5.1.2"},"volume_id":"c189b0b4-5bbd-40d6-b1b8-bb8645218402"}
I think the DSM in context is also required so that the CSI knows which DSM to contact, but I'm not certain.
Also, make sure that your volumes have an appropriate prefix as defined in https://github.com/SynologyOpenSource/synology-csi/blob/dc05a795b79b911ec5882c3c837a7779cf3576a8/pkg/models/dsm.go#L19-L21
Thanks @ressu. Apologies for my ignorance but how would I find the UUID of the LUN?
Oh, right! It's not immediately visible. I think I found the UUID by inspecting the source code in the admin UI. I had to dig around there when I migrated my old volumes. The normal volume creation automation in Kubernetes will figure it automatically for the volumes created by the CSI. But for externally created volumes the pain of searching the HTML in admin console was real :(
Oh boy that is a bit hacky. Ok I'll look around. Do you remember what part of the UI you were able to find it in?
Yeah, It's very hacky. I preferred the third party CSI mechanism over this one, but you use what you can :smile:
The way you can find the UUID of the volume is in SAN Manager. If you list the LUNs there and look at the source code, the itemid
attribute will list the UUID of each volume. It's a bit of a pain to list the volumes though since the UI tries to refresh the HTML all the time. So depending on your browser you might need to try to introduce a breakpoint somewhere to see the actual contents.
The way I got it is in chrome using the inspector and I just created a breakpoint to the HTML element subtree modifications, which froze things for long enough to properly copy the UUID from the list.
Awesome! I was able to find the UUID via the itemid
in the source code on the SAN manager page. I hadn't yet created a LUN, so I first did that, and then grabbed the UUID. I am still getting the same error Volume[763336ca-0f20-4fcf-8e8d-3406168c60fc] is not found
but I'm wondering if it's related to your other comment above:
Also, make sure that your volumes have an appropriate prefix as defined in
The volume prefix - is that something I configure on the DSM side or the Nomad side? I don't see a prefix option in the LUN configuration:
fwiw the IqnPrefix
does appear to be correct:
The prefix goes into the LUN name. I think you also need to create an iSCSI host with the same prefix. Mine are in the form of k8s-csi-<kubernetes volume name>
. So the suffix in the name doesn't matter as long as it starts with k8s-csi
.
The synology-csi only looks for LUNs with the prefix "k8s-csi". Try to change the LUN name from "LUN-1" to "k8s-csi-LUN-1".
Sweet, thanks to both of you! synology-csi was able to successfully find the volume after using the UUID and creating a LUN and iSCSI host with the name k8s-csi-LUN-1
. Now on to the next error which is:
2022-01-06T18:59:06Z [ERROR] [driver/initiator.go:37] Failed to run iscsiadm session: exit status 1
2022-01-06T18:59:06Z [ERROR] [driver/initiator.go:114] Failed in discovery of the target: Couldn't find hostPath: /host in the CSI container (exit status 1)
2022-01-06T18:59:06Z [ERROR] [driver/utils.go:108] GRPC error: rpc error: code = Internal desc = rpc error: code = Internal desc = Failed to login with target iqn [iqn.2000-01.com.synology:RackStationNYHQ.Target-1.72e4481bb23], err: Couldn't find hostPath: /host in the CSI container (exit status 1)
The chroot.sh
script is indicating that /host
directory needs to be available into the container, is that right? Can you elaborate on what's needed here?
https://github.com/SynologyOpenSource/synology-csi/blob/dc05a795b79b911ec5882c3c837a7779cf3576a8/chroot/chroot.sh#L4
fwiw I tried exec
-ing into the running synology-csi containers and mkdir /host
just to see if making that directory available helped, but synology-csi now says
err: chroot: can't execute '/usr/bin/env': No such file or directory
The /host
directory is a bind mount of the filesystem from the node (machine which is doing the mounting)
Relevant Kubernetes configurations are https://github.com/SynologyOpenSource/synology-csi/blob/dc05a795b79b911ec5882c3c837a7779cf3576a8/deploy/kubernetes/v1.19/node.yml#L112-L113 and https://github.com/SynologyOpenSource/synology-csi/blob/dc05a795b79b911ec5882c3c837a7779cf3576a8/deploy/kubernetes/v1.19/node.yml#L132-L135
I don't know how the containers are configured for Monad, but effectively you need to mount /
into the container as the directory /host
.
Ok makes sense, I need to make a Nomad client config change to make that root filesystem from the host available, so I will work on that and then try deploying the job again. My instinct makes me nervous to mount the entire host filesystem inside the container. Would you mind describing why is this needed?
Thanks again @ressu for helping here! I'm hoping that all this information and troubleshooting will be useful to other folks who try to use synology-csi with Nomad.
My instinct makes me nervous to mount the entire host filesystem inside the container. Would you mind describing why is this needed?
Trust me, you're not alone with this one. I wanted to work around the mount in other CSIs, but couldn't find a reliable way :laughing:
As far as I understand, the host mount allows the CSI to act as the host system while the container sandbox is in place. It's a cheap trick used quite often to reduce complexity of the code whenever there are too many dependencies to the host system. I've seen the same pattern being used in other CSIs and CNIs.
Thanks again @ressu for helping here! I'm hoping that all this information and troubleshooting will be useful to other folks who try to use synology-csi with Nomad.
Happy to help, I'm mostly stabbing in the dark since I've never run Nomad myself. But I'm just happy that you are able to make progress with the hints I'm able to give you.
I need to take a pause on this synology-csi <> Nomad work and will hopefully come back to it at a later date. For now I will close out this issue since all my open questions have been answered. I'll reopen this issue if/when I come back to it and new questions arise. If anyone finds this issue in the future and wants to know how my nomad job.hcl, volume.hcl, and related configuration ended up, please reach out. Thanks again for all the help!
The creation of a storage does work, but there is still some problem with the access mode within nomad:
$ nomad volume status
Container Storage Interface
ID Name Plugin ID Schedulable Access Mode
test test synology true <none>
my current configuration for the nomad csi plugin job is like this
job "plugin-synology" {
type = "system"
group "controller" {
task "plugin" {
driver = "docker"
config {
image = "docker.io/synology/synology-csi:v1.0.0"
privileged = true
volumes = [
"local/csi.yaml:/etc/csi.yaml",
"/:/host",
]
args = [
"--endpoint",
"unix://csi/csi.sock",
"--client-info",
"/etc/csi.yaml",
]
}
template {
destination = "local/csi.yaml"
data = <<EOF
---
clients:
- host: 192.168.1.2
port: 8443
https: true
username: nomad
password: <password>
EOF
}
csi_plugin {
id = "synology"
type = "monolith"
mount_dir = "/csi"
}
resources {
cpu = 256
memory = 256
}
}
}
}
and the volume definition for the nomad volume create
is like
id = "test"
name = "test"
type = "csi"
plugin_id = "synology"
capacity_min = "1GiB"
capacity_max = "2GiB"
capability {
access_mode = "single-node-writer"
attachment_mode = "file-system"
}
mount_options {
mount_flags = ["rw"]
}
Hi @mabunixda, I think I ran into a similar issue. I used the nomad volume register
command instead of create
. I got this error even when I had the access_mode
defined within the capability
, as you do: Error registering volume: Unexpected response code: 500 (rpc error: validation: missing access mode, missing attachment mode)
.
Interestingly, when I moved the access_mode
and attachment_mode
to the top level, outside the capability
block, the nomad volume register
command worked and the volume had the correct access mode. According to the docs, that's not how it should work, but maybe it's a mistake in the docs or it's changed in more recent versions of Nomad (I'm on 1.0.4). Here's my volume.hcl
:
id = "test"
name = "test"
type = "csi"
external_id = "a53b447a-c52b-48e5-9810-943e3b527a68"
plugin_id = "synology"
access_mode = "single-node-reader-only"
attachment_mode = "file-system"
mount_options {
fs_type = "btrfs"
mount_flags = ["noatime"]
}
context {
dsm = "<dsm-ip>"
}
I didn't tried the nomad volume create
command. Does it actually create the volume in Synology? If so, that's better because then I don't have to go hunting for the UUID for the external_id
field.
@johnnyplaydrums yes the create actually creates a volume on my synology - but it does not get usable on nomad ..
@mabunixda does putting access_mode
and attachment_mode
outside the capability
solve that issue for you?
no because that is not a valid syntax for nomad > 1.1.0
Ah I see ☹️
Hello everyone, I am quite interested on this thread and will be hitting this wall soon (Have not setup nomad on this new setup), I hope we can make this work together at some point :)
I started working on this also. I took a route of following the Stateful Workloads tutorial and copying what made sense from the synology-csi configs. My Fork, nomad stuff is in deploy/nomad/v1.2.5
I've been able to get a controller and node going. Both appear to be running, connect to DSM, and no errors showing in the docker logs.
I can create volumes, and these show up in SAN Manager in DSM with what appear to be the correct settings. In Nomad they also show as Schedulable but Access Mode for all volumes is
2022-02-03T21:18:51-06:00: Task Group "mysql-server" (failed to place 1 allocation):
* Constraint "missing CSI Volume test2[0]": 1 nodes excluded by filter
At this point I suspect there's a miscommunication between nomad and the csi plugin for grabbing capabilities of a volume, but I'm not sure how to test it. I have set log-level=debug for controller and node which does print a lot of data. Not sure how to get something similar on the nomad side. Debug level logging in nomad doesn't seem to show any of the actual communication with the plugin.
This driver implements csi but does so with k8s-isms as you have discovered. I have a pure csi based driver that works with synology (and nomad) available here: https://github.com/democratic-csi/democratic-csi
My instinct makes me nervous to mount the entire host filesystem inside the container. Would you mind describing why is this needed?
Trust me, you're not alone with this one. I wanted to work around the mount in other CSIs, but couldn't find a reliable way 😆
As far as I understand, the host mount allows the CSI to act as the host system while the container sandbox is in place. It's a cheap trick used quite often to reduce complexity of the code whenever there are too many dependencies to the host system. I've seen the same pattern being used in other CSIs and CNIs.
Thanks again @ressu for helping here! I'm hoping that all this information and troubleshooting will be useful to other folks who try to use synology-csi with Nomad.
Happy to help, I'm mostly stabbing in the dark since I've never run Nomad myself. But I'm just happy that you are able to make progress with the hints I'm able to give you.
Necro'ing this thread since I'm also banging my wall against a wall getting synology-csi to work on Nomad + my Synology DS220+.
I got Nomad running on my DS220+ with the latest DSM, but when I try to deploy the synology-csi I'm getting the following error messages in the systemd journal:
Mar 26 13:52:49 storage nomad[17771]: 2023-03-26T13:52:49.687+0200 [WARN] client.alloc_runner.task_runner.task_hook.api: error creating task api socket: alloc_id=4969158d-6045-297a-a770-89b47d94e21f task=synology-csi-plugin path=/volume1/homelab/nomad/var/lib/nomad/alloc/4969158d-6045-297a-a770-89b47d94e21f/synology-csi-plugin/secrets/api.sock error="listen unix /volume1/homelab/nomad/var/lib/nomad/alloc/4969158d-6045-297a-a770-89b47d94e21f/synology-csi-plugin/secrets/api.sock: bind: invalid argument"
Mar 26 13:53:41 storage nomad[17771]: 2023-03-26T13:53:41.634+0200 [ERROR] client.alloc_runner.task_runner.task_hook: killing task because plugin failed: alloc_id=4969158d-6045-297a-a770-89b47d94e21f task=synology-csi-plugin error="CSI plugin failed probe: timeout while connecting to gRPC socket: failed to stat socket: stat /volume1/homelab/nomad/var/lib/nomad/client/csi/plugins/4969158d-6045-297a-a770-89b47d94e21f/csi.sock: no such file or directory"
Mar 26 13:53:41 storage nomad[17771]: 2023-03-26T13:53:41.634+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=4969158d-6045-297a-a770-89b47d94e21f task=synology-csi-plugin type="Plugin became unhealthy" msg="Error: CSI plugin failed probe: timeout while connecting to gRPC socket: failed to stat socket: stat /volume1/homelab/nomad/var/lib/nomad/client/csi/plugins/4969158d-6045-297a-a770-89b47d94e21f/csi.sock: no such file or directory" failed=false
Mar 26 13:53:41 storage nomad[17771]: 2023-03-26T13:53:41.886+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=4969158d-6045-297a-a770-89b47d94e21f task=synology-csi-plugin type=Killing msg="CSI plugin did not become healthy before configured 30s health timeout" failed=true
Mar 26 13:53:47 storage nomad[17771]: 2023-03-26T13:53:47.890+0200 [ERROR] client.alloc_runner.task_runner.task_hook: failed to kill task: alloc_id=4969158d-6045-297a-a770-89b47d94e21f task=synology-csi-plugin kill_reason="CSI plugin failed probe: timeout while connecting to gRPC socket: failed to stat socket: stat /volume1/homelab/nomad/var/lib/nomad/client/csi/plugins/4969158d-6045-297a-a770-89b47d94e21f/csi.sock: no such file or directory" error="context canceled"
Nomad is running as root, therefore I think we can exclude permission issues.
Any idea what might cause these issues? The first line with "bind: illegal argument" looks like the culprit to me. I have the suspicion that the Linux version DSM is based on is too old, but some confirmation or recommendation to fix this would be nice.
I was able to successfully get this working with Nomad 1.6.x using iSCSI. As stated above, the SMB portion relies on a k8s secret which Nomad cannot consume. Here is a working example of a csi-driver job, volume spec example, and an example job file consuming that volume for anyone who might need it. I'll submit a PR for docs at some point.
One of the things that may seem obvious, but I can't find listed anywhere is the need for the following packages installed on the host running Nomad:
open-iscsi
lsscsi
sg3-utils
multipath-tools
scsitools
I have not tested what exactly is required or not required. I found it on a blog and just installed them all and it works.
Using monolith here. Not really a need to break it out for homelab use which I assume is what most people are using Synology's for.
Run nomad job run synology-csi.nomad.hcl
job "synology-csi" {
datacenters = ["dc1"]
type = "system"
node_pool = "default"
group "controller" {
task "plugin" {
driver = "docker"
config {
image = "synology/synology-csi:v1.1.2"
privileged = true
network_mode = "host"
mount {
type = "bind"
source = "/"
target = "/host"
readonly = false
}
mount {
type = "bind"
source = "local/csi.yaml"
target = "/etc/csi.yaml"
readonly = true
}
args = [
"--endpoint",
"unix://csi/csi.sock",
"--client-info",
"/etc/csi.yaml"
]
}
template {
data = <<EOH
---
clients:
- host: <ip of synology host>
port: 5000
https: false
username: <username with admin privileges>
password: <password>
EOH
destination = "local/csi.yaml"
}
csi_plugin {
id = "synology"
type = "monolith"
mount_dir = "/csi"
}
resources {
cpu = 500
memory = 256
}
}
}
}
Run: nomad volume create example-volume.nomad.hcl
id = "example"
name = "example"
type = "csi"
plugin_id = "synology"
capacity_min = "1GiB"
capacity_max = "2GiB"
capability {
access_mode = "single-node-writer"
attachment_mode = "file-system"
}
#mount/fstab options https://linux.die.net/man/8/mount
mount_options {
fs_type = "btrfs"
mount_flags = ["noatime"]
}
#if you have multiple storage pools and/or volumes, specify where to mount the container volume/LUN or else it'll just pick one for you
parameters {
location = "/volume2"
}
Validate its created and healthy by running nomad volume status
datacenters = ["dc1"]
node_pool = "default
"
group "web" {
count = 1
volume "example_volume" {
type = "csi"
read_only = false
source = "example"
access_mode = "single-node-writer"
attachment_mode = "file-system"
}
network {
port "http" {
static = 8888
to = 80
}
}
task "nginx" {
driver = "docker"
volume_mount {
volume = "example_volume"
destination = "/config"
read_only = false
}
config {
image = "nginxdemos/hello:latest"
ports = ["http"]
}
}
}
}
Run nomad job run synology-csi-example.nomad.hcl
???
Profit!
@awanaut thank you for this information, im trying to set it up myself. Can you share some infonregarding the config on thr synology. Is it still required to create a lun with the k8s-csi prefix as mentioned above?
@awanaut thank you for this information, im trying to set it up myself. Can you share some infonregarding the config on thr synology. Is it still required to create a lun with the k8s-csi prefix as mentioned above?
Nope! You just need to make sure the volume is available under the "parameter" stanza. nomad volume create
will create the LUN on the backend. If you have a LUN already created and you wanted to use that, you'd use nomad volume register
, however some of the config file parameters are different. Check here: https://developer.hashicorp.com/nomad/docs/commands/volume/register.
@awanaut Thanks a ton for figuring all this out. I was able to get it set up and working on my cluster.
I ran into one minor issue that I'm wondering if others have seen. When mounting an iSCSI volume into a task, the mount point is owned by root (uid/gid=0), with permissions of 755. This causes some apps, such as postgres, to fail since they run as a non-root user and try to chown their data directory on startup.
I got around this by creating a sidecar pre-start task that fixed the permissions on the volume before the main task runs, but I'm wondering if there's a better/cleaner way. I've experimented with a few settings without much luck.
@awanaut @s4v4g3 does the snapshot work in combination with Nomad?
@s4v4g3 Might this be related to the func createTargetMountPath
which creates the mounted folder and sets the permissions to 0750
? If this is the one causes the issue, where can we place a config item to change this so its not hardcoded anymore. Any thoughts?
@awanaut @s4v4g3 does the snapshot work in combination with Nomad?
I have not tested CSI snapshots to see if they just use Synology's snapshots. I imagine they do.
@awanaut, any suggestion on how to define the nomad job, I'm struggling to convert the Kubernetes spec to nomad for snapshots.
Hello! I was wondering if synology-csi works with Nomad? At first glance it would appear there is only support for Kubernetes, but I just wanted to double check. Thank you