datashim-io / datashim

A kubernetes based framework for hassle free handling of datasets
http://datashim-io.github.io/datashim
Apache License 2.0
479 stars 68 forks source link

Download speed from internet into s3 PVC on EKS is slow #247

Open mahi402 opened 1 year ago

mahi402 commented 1 year ago

Please find logs, bucket is same region

Defaulted container "driver-registrar" out of: driver-registrar, csi-s3 I0330 18:04:33.986814 1 main.go:164] Version: v2.3.0 I0330 18:04:33.986839 1 main.go:165] Running node-driver-registrar in mode=registration I0330 18:04:33.987521 1 main.go:189] Attempting to open a gRPC connection with: "/csi/csi.sock" I0330 18:04:33.987541 1 connection.go:154] Connecting to unix:///csi/csi.sock I0330 18:04:35.979600 1 main.go:196] Calling CSI driver to discover driver name I0330 18:04:35.979628 1 connection.go:183] GRPC call: /csi.v1.Identity/GetPluginInfo I0330 18:04:35.979632 1 connection.go:184] GRPC request: {} I0330 18:04:35.984351 1 connection.go:186] GRPC response: {"name":"ch.ctrox.csi.s3-driver","vendor_version":"v1.1.1"} I0330 18:04:35.984410 1 connection.go:187] GRPC error: I0330 18:04:35.984416 1 main.go:206] CSI driver name: "ch.ctrox.csi.s3-driver" I0330 18:04:35.984434 1 node_register.go:52] Starting Registration Server at: /registration/ch.ctrox.csi.s3-driver-reg.sock I0330 18:04:35.984594 1 node_register.go:61] Registration Server started at: /registration/ch.ctrox.csi.s3-driver-reg.sock I0330 18:04:35.984645 1 node_register.go:91] Skipping healthz server because HTTP endpoint is set to: "" I0330 18:04:36.393034 1 main.go:100] Received GetInfo call: &InfoRequest{} I0330 18:04:36.393226 1 main.go:107] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/csi-s3/registration" I0330 18:04:36.427337 1 main.go:118] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}

srikumar003 commented 1 year ago

@mahi402 thanks but could you paste the logs of csi-s3 container instead

mahi402 commented 1 year ago

logscontainer2.txt logscontainer.txt

Please find logs of 2 containers running under deamon set

mahi402 commented 1 year ago

could you please let me know if you are able to go through above logs?

mahi402 commented 1 year ago

if i upload any file from s3 pvc to s3, its very fast, but downloading from internet to s3 pvc slow

mahi402 commented 1 year ago

can you please let me know what type of driver you are using?..is it Rclone, we want to enable caching mechanism for faster speed

srikumar003 commented 1 year ago

@mahi402 I looked through your logfiles and it seems one of your csi-s3 daemons (in logscontainer2.txt) is not able to connect to S3 endpoint while the other (ip-10-0-1-25.us-west-2.compute.internal). You could try restarting the csi-s3 daemon pod on the malfunctioning node.

We use goofys as the mounter by default but there is a backend implementation for rclone that is not tested yet.

mahi402 commented 1 year ago

thank you for reply, in your opinion if we use rclone, will the performance faster??, and can you please let me know how can we convert this dlf.yaml, to use rclone?

mahi402 commented 1 year ago

HI, also we are getting high data tansfer costs in AWS, if we use csi s3 dataset as pvc to the EKS. , can you please let me know if any alternative implementation?

srikumar003 commented 1 year ago

@mahi402 csi-s3 should not make any decisions that impact transfer costs. Can you please try accessing the bucket in the cluster with aws s3 command with the same endpoints as you have configured for the dataset, and inform if the transfer costs are different in this case ? Thanks