NetApp / trident

Storage orchestrator for containers
Apache License 2.0
732 stars 218 forks source link

Trident failed to properly remove multipath link to PV on docker swarm worker #894

Open vlgenf71 opened 3 months ago

vlgenf71 commented 3 months ago

Describe the bug when a persistant volume is mounted by a service on a swarm worker node, il we modify the swarm service to mount another persistant volume then this can cause high IO wait on the first swarm worker when the remove the first persistant volume

Environment We use trident + Ontap Select iscsi to be able to consume persistant volume for our services on Docker swarm clusters

Provide accurate information about the environment to help us reproduce the issue.

To Reproduce Steps to reproduce the behavior: start.sh to create docker swarm service with a persistant volume

# Volumes
/export SERVICE_TEST_VOLUME=TestVolume1
/export SERVICE_TEST_VOLUME_SIZE='1gb'

/vol1=`docker volume inspect $SERVICE_TEST_VOLUME | wc -c`
if [ $vol1 -gt 3 ]
then
  echo "$SERVICE_TEST_VOLUME exists"
else
  echo "Creating volume $SERVICE_TEST_VOLUME"
  docker volume create --driver=netapp --name=$SERVICE_TEST_VOLUME -o size=$SERVICE_TEST_VOLUME_SIZE -o fileSystemType=ext4  -o spaceReserve=volume
  docker run --rm -v $SERVICE_TEST_VOLUME:/data busybox rmdir /data/lost+found
fi
docker stack deploy -c docker-compose.yml --resolve-image=always --prune --with-registry-auth SERVICE_TEST

we deploy this service on our swarm cluster. Swarm manager starts this service on worker node A

[root@nodeA:~]# mount |grep testv
/dev/mapper/3600a098056303030313f526b682f4279 on /local/docker-data/plugins/b8fe688a4fd41d4af97f5de3ce33dee1f7f862d89ba982eec79bf5c785b93c9c/propagated-mount/netappdvp_testvolume type ext4 (rw,relatime,stripe=16)
/dev/mapper/3600a098056303030313f526b682f4279 on /local/docker-data/plugins/b8fe688a4fd41d4af97f5de3ce33dee1f7f862d89ba982eec79bf5c785b93c9c/propagated-mount/netappdvp_testvolume type ext4 (rw,relatime,stripe=16)
[root@nodeA:~]# multipath -ll
3600a098056303030313f526b682f4279 dm-8 NETAPP,LUN C-Mode
size=954M features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 4:0:0:227 sdc 8:32 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  `- 3:0:0:227 sdb 8:16 active ready running

Then we modify the volume name to TestVolume2 and redeploy the service

export SERVICE_TEST_VOLUME=TestVolume2

The service is stopped on node A NetApp Trident create a new volume TestVolume2 The service is started on another swarm worker node : node B

On node A we can no longer see TestVolume1 with "mount |grep TestVolume1" But there are still some multipath info on node A

[root@nodeA:~]# mount |grep testv
[root@nodeA:~]# multipath -ll
3600a098056303030313f526b682f4279 dm-8 NETAPP,LUN C-Mode
size=954M features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 4:0:0:227 sdc 8:32 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  `- 3:0:0:227 sdb 8:16 active ready running

then on one of the swarm manager we launch "docker volume rm TestVolume1"

[root@nodeA:~]# multipath -ll
3600a098056303030313f526b682f4279 dm-8 NETAPP,LUN C-Mode
size=954M features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=0 status=enabled
| `- 4:0:0:227 sdc 8:32 **failed faulty running**
`-+- policy='service-time 0' prio=0 status=enabled
  `- 3:0:0:227 sdb 8:16 **failed faulty running**

[root@nodeA:~]# top
top - 18:28:57 up 1 day,  2:02,  2 users,  load average: 0.80, 0.30, 0.10
Tasks: 310 total,   1 running, 309 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.3 sy,  0.0 ni, 82.9 id, **16.6 wa**,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   7656.0 total,   5421.9 free,   1101.3 used,   1402.1 buff/cache
MiB Swap:   6144.0 total,   6144.0 free,      0.0 used.   6554.7 avail Mem

to remove high IO wait we have to use dmsetup command

[root@nodeA:~]# dmsetup -f remove 3600a098056303030313f526b682f4279
[root@nodeA:~]# multipath -ll
[root@nodeA:~]# top
top - 18:29:50 up 1 day,  2:03,  2 users,  load average: 0.97, 0.43, 0.16
Tasks: 306 total,   1 running, 305 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.0 us,  1.9 sy,  0.0 ni, 97.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   7656.0 total,   5454.4 free,   1070.0 used,   1400.7 buff/cache
MiB Swap:   6144.0 total,   6144.0 free,      0.0 used.   6586.0 avail Mem

Expected behavior Trident should clear multipath link to unsuse persistant volume before deleting volume on Ontap backend It's not clear for me if that must be docker swarm who call trindent plugin on each swarm worker to do this or if docker swarm just have to call trident plugin on swarm manager and then trident plugin from this swarm manager have to call all trident plugin on every swarm worker nodes

Additional context Add any other context about the problem here.

vlgenf71 commented 3 months ago

I dig into Trident source code, espacialy in func Unmount() in Trident/plugin.go file

There is a comment which says something diferent than docker plugin specification. link to Unmount function in plugin.co here are the two comment lines

// No longer detaching and removing iSCSI session here because it was causing issues with 'docker cp'. 
// See https://github.com/moby/moby/issues/34665

Comments in the moby issue explain that the storage plugin should count each time a volume is mounted to a container so that it should unmount to from system only when there is no more container using this volume and not to prevent detaching and removing iSCSI session as stated in this comment.

the docker documentation of storage plugin states also that the plugin have to count the number of time plugin is called to mount one particular volume. https://docs.docker.com/engine/extend/plugins_volume/#volumedrivermount

The more I look into the code and activate debug log, the more I think there is an issue with Trident Plugin : it should remove iscsi device mount when there is no more container using one particular volume on one node.

Could someone tell me if I am right ?

I managed to activate Trident plugin logs with

docker plugin set netapp:latest debug=true

and by adding to /etc/netappdvp/config.json

    "debugTraceFlags": {"api":true, "method":true}

I can see all call to NetApp API but I can still no see some of the debug log which I see in source code. By instance, https://github.com/NetApp/trident/blob/master/utils/mount_linux.go#L338-L339 or https://github.com/NetApp/trident/blob/master/frontend/docker/plugin.go#L487C2-L495C58 Is there any way to activate these logs and where could I see them ?

mravi-na commented 3 months ago

Hi @vlgenf71, Thanks for the detailed analysis and the references. From the moby issue I see that the conclusion is that docker has to handle plugin failure scenarios. And other comment says this workflow (not removing iSCSI connections until volume deletion) cause challenges in swarm environment. Nevertheless, it appears the current work flow resolves one issue but may have challenges with swarm. It may require more debate and prioritize to address in Trident plugin implementation.

And, for the debug logs to be visible at https://github.com/NetApp/trident/blob/master/utils/mount_linux.go#L338-L339 This is in CSI work flow and not in docker. I can see the logs with my work flows.

https://github.com/NetApp/trident/blob/master/frontend/docker/plugin.go#L487C2-L495C58 For these logs to be visible, you might want to change --disable_audit_log to false while installing trident or you can edit daemonset.apps/trident-node-linux to set it to false. By default this flag is set to true and hence you could not see those logs in docker work flow.

And, I see you have been using very old Trident image 20.10.16, would you mind updating to latest since there has been changes in Trident 23.10 and later for iSCSI strengthening, and multipath device removal areas.

Thanks.

vlgenf71 commented 3 months ago

Hi @mravi-na, Thank you for your answer.

How can I use the "-disable_audit_log" to false ?

I deploy trident plugin with this command :

docker plugin install --grant-all-permissions --alias netapp netapp/trident-plugin:23.07.1 config=config.json

I made a typo in my post : 20.10.16 is the docker version I use, I deployed v23.07.01 trident plugin version :-)

vlgenf71 commented 3 months ago

Hi @mravi-na,

Once again, thank you the time you spent to give me an answer

It's not clear for me why the uMount() func in mount_linux.go would not be call in docker plugin mode

https://github.com/NetApp/trident/blob/master/utils/mount_linux.go#L338-L339
This is in CSI work flow and not in docker. I can see the logs with my work flows.

I understand that the entry point of the Unmount Trident plugin function is this Unmount() function : https://github.com/NetApp/trident/blob/master/frontend/docker/plugin.go#L484-L515

This function calls p.orchestrator.DetachVolume

then I can see that the utils.Umount(ctx, mountpoint) is called https://github.com/NetApp/trident/blob/master/core/orchestrator_core.go#L3626

I can see only 3 implementation of his utils.Umount function, the one in mount_linux.go file seems the mostly likely to me ‎utils/mount_darwin.go‎ func Umount(ctx context.Context, mountpoint string) (err error) {

‎utils/mount_linux.go‎ func Umount(ctx context.Context, mountpoint string) (err error) {

‎utils/mount_windows.go‎ func Umount(ctx context.Context, mountpoint

mravi-na commented 3 months ago

Hi @vlgenf71 Sorry for confusion :( , I meant to say that I tested in CSI workflows and I could see the debug logs. I have not tested in docker setup yet.