Open vlgenf71 opened 8 months ago
I dig into Trident source code, espacialy in func Unmount() in Trident/plugin.go file
There is a comment which says something diferent than docker plugin specification. link to Unmount function in plugin.co here are the two comment lines
// No longer detaching and removing iSCSI session here because it was causing issues with 'docker cp'.
// See https://github.com/moby/moby/issues/34665
Comments in the moby issue explain that the storage plugin should count each time a volume is mounted to a container so that it should unmount to from system only when there is no more container using this volume and not to prevent detaching and removing iSCSI session as stated in this comment.
the docker documentation of storage plugin states also that the plugin have to count the number of time plugin is called to mount one particular volume. https://docs.docker.com/engine/extend/plugins_volume/#volumedrivermount
The more I look into the code and activate debug log, the more I think there is an issue with Trident Plugin : it should remove iscsi device mount when there is no more container using one particular volume on one node.
Could someone tell me if I am right ?
I managed to activate Trident plugin logs with
docker plugin set netapp:latest debug=true
and by adding to /etc/netappdvp/config.json
"debugTraceFlags": {"api":true, "method":true}
I can see all call to NetApp API but I can still no see some of the debug log which I see in source code. By instance, https://github.com/NetApp/trident/blob/master/utils/mount_linux.go#L338-L339 or https://github.com/NetApp/trident/blob/master/frontend/docker/plugin.go#L487C2-L495C58 Is there any way to activate these logs and where could I see them ?
Hi @vlgenf71, Thanks for the detailed analysis and the references. From the moby issue I see that the conclusion is that docker has to handle plugin failure scenarios. And other comment says this workflow (not removing iSCSI connections until volume deletion) cause challenges in swarm environment. Nevertheless, it appears the current work flow resolves one issue but may have challenges with swarm. It may require more debate and prioritize to address in Trident plugin implementation.
And, for the debug logs to be visible at https://github.com/NetApp/trident/blob/master/utils/mount_linux.go#L338-L339 This is in CSI work flow and not in docker. I can see the logs with my work flows.
https://github.com/NetApp/trident/blob/master/frontend/docker/plugin.go#L487C2-L495C58 For these logs to be visible, you might want to change --disable_audit_log to false while installing trident or you can edit daemonset.apps/trident-node-linux to set it to false. By default this flag is set to true and hence you could not see those logs in docker work flow.
And, I see you have been using very old Trident image 20.10.16, would you mind updating to latest since there has been changes in Trident 23.10 and later for iSCSI strengthening, and multipath device removal areas.
Thanks.
Hi @mravi-na, Thank you for your answer.
How can I use the "-disable_audit_log" to false ?
I deploy trident plugin with this command :
docker plugin install --grant-all-permissions --alias netapp netapp/trident-plugin:23.07.1 config=config.json
I made a typo in my post : 20.10.16 is the docker version I use, I deployed v23.07.01 trident plugin version :-)
Hi @mravi-na,
Once again, thank you the time you spent to give me an answer
It's not clear for me why the uMount() func in mount_linux.go would not be call in docker plugin mode
https://github.com/NetApp/trident/blob/master/utils/mount_linux.go#L338-L339
This is in CSI work flow and not in docker. I can see the logs with my work flows.
I understand that the entry point of the Unmount Trident plugin function is this Unmount() function : https://github.com/NetApp/trident/blob/master/frontend/docker/plugin.go#L484-L515
This function calls p.orchestrator.DetachVolume
then I can see that the utils.Umount(ctx, mountpoint) is called https://github.com/NetApp/trident/blob/master/core/orchestrator_core.go#L3626
I can see only 3 implementation of his utils.Umount function, the one in mount_linux.go file seems the mostly likely to me utils/mount_darwin.go func Umount(ctx context.Context, mountpoint string) (err error) {
utils/mount_linux.go func Umount(ctx context.Context, mountpoint string) (err error) {
utils/mount_windows.go func Umount(ctx context.Context, mountpoint
Hi @vlgenf71 Sorry for confusion :( , I meant to say that I tested in CSI workflows and I could see the debug logs. I have not tested in docker setup yet.
Describe the bug when a persistant volume is mounted by a service on a swarm worker node, il we modify the swarm service to mount another persistant volume then this can cause high IO wait on the first swarm worker when the remove the first persistant volume
Environment We use trident + Ontap Select iscsi to be able to consume persistant volume for our services on Docker swarm clusters
Provide accurate information about the environment to help us reproduce the issue.
To Reproduce Steps to reproduce the behavior: start.sh to create docker swarm service with a persistant volume
we deploy this service on our swarm cluster. Swarm manager starts this service on worker node A
Then we modify the volume name to TestVolume2 and redeploy the service
The service is stopped on node A NetApp Trident create a new volume TestVolume2 The service is started on another swarm worker node : node B
On node A we can no longer see TestVolume1 with "mount |grep TestVolume1" But there are still some multipath info on node A
then on one of the swarm manager we launch "docker volume rm TestVolume1"
to remove high IO wait we have to use dmsetup command
Expected behavior Trident should clear multipath link to unsuse persistant volume before deleting volume on Ontap backend It's not clear for me if that must be docker swarm who call trindent plugin on each swarm worker to do this or if docker swarm just have to call trident plugin on swarm manager and then trident plugin from this swarm manager have to call all trident plugin on every swarm worker nodes
Additional context Add any other context about the problem here.