datashim-io / datashim

A kubernetes based framework for hassle free handling of datasets
http://datashim-io.github.io/datashim
Apache License 2.0
477 stars 68 forks source link

Transport endpoint is not connected error happening frequently and intermittently . #324

Open rrehman-hbk opened 10 months ago

rrehman-hbk commented 10 months ago

We are using datashim to connect to s3 bucket with access key and secret. We are mounting that volume 5-6 services. Even without the services restarting, we could see the service throwing "Transport endpoint is not connected error ". When we restart the service, the service is able to connect and read data.

We have installed datashim in dlf namespace and dataset in the namespace where are other services are present. Pasting csi-s3 pod logs

Defaulted container "driver-registrar" out of: driver-registrar, csi-s3
I1214 06:00:34.644398       1 main.go:167] Version: v2.8.0
I1214 06:00:34.644462       1 main.go:168] Running node-driver-registrar in mode=registration
I1214 06:00:34.644928       1 main.go:192] Attempting to open a gRPC connection with: "/csi/csi.sock"
I1214 06:00:34.644954       1 connection.go:164] Connecting to unix:///csi/csi.sock
I1214 06:00:35.646808       1 main.go:199] Calling CSI driver to discover driver name
I1214 06:00:35.646842       1 connection.go:193] GRPC call: /csi.v1.Identity/GetPluginInfo
I1214 06:00:35.646849       1 connection.go:194] GRPC request: {}
I1214 06:00:35.655130       1 connection.go:200] GRPC response: {"name":"ch.ctrox.csi.s3-driver","vendor_version":"v1.1.1"}
I1214 06:00:35.655149       1 connection.go:201] GRPC error: <nil>
I1214 06:00:35.655165       1 main.go:209] CSI driver name: "ch.ctrox.csi.s3-driver"
I1214 06:00:35.655288       1 node_register.go:53] Starting Registration Server at: /registration/ch.ctrox.csi.s3-driver-reg.sock
I1214 06:00:35.655532       1 node_register.go:62] Registration Server started at: /registration/ch.ctrox.csi.s3-driver-reg.sock
I1214 06:00:35.655661       1 node_register.go:92] Skipping HTTP server because endpoint is set to: ""
I1214 06:00:36.469837       1 main.go:102] Received GetInfo call: &InfoRequest{}
I1214 06:00:36.470165       1 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/csi-s3/registration"
I1214 06:00:36.485115       1 main.go:121] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}`
srikumar003 commented 10 months ago

@rrehman-hbk there seems to be no error in the logs above. There is one csi-s3 pod per K8s node. Please check all the other ones and please paste the lines with any errors or warnings

rrehman-hbk commented 10 months ago

This entire instance is on K3s on a VM. We have only one pod. I am attaching csi-attacher logs too.

csi-atatcher

Output of kubectl get pods in dlf namespace where datashim is installed image

The service which gets disconnected from s3 is throwing the following error: s3-01: Transport endpoint is not connected

rrehman-hbk commented 10 months ago

More info: When we are uploading small file 2.4MB etc, things work. When we tried to upload files with size 87MB, it uploads certain percentage and fails -> when I check s3, I see it fails after 10MB. Is this due to some setting somewhere. I can directly upload the file in s3 bucket in AWS.

@srikumar003 Is there any config which limits things to 10 Mb

hao-tang-ts commented 2 months ago

Same issue @rrehman-hbk, Is there any update?