dragonflyoss / Dragonfly2

Dragonfly is an open source P2P-based file distribution and image acceleration system. It is hosted by the Cloud Native Computing Foundation (CNCF) as an Incubating Level Project.
https://d7y.io
Apache License 2.0
2.2k stars 280 forks source link

Slow Image Pull Despite Hitting Local Cache with dfdaemon #3455

Open liuyuxuan0723 opened 3 weeks ago

liuyuxuan0723 commented 3 weeks ago

I've deployed Dragonfly in a Kubernetes cluster using the Helm chart. I've configured it to proxy a private image registry. Here are my steps:

  1. On a random node, pulling a 160MB image for the first time takes 10 seconds.
  2. In the var/lib/dragonfly directory of dfdaemon, I can see cached pieces. The monitoring shows the metrics dragonfly_scheduler_traffic with both backtosource and remotepeer traffic types.
  3. After manually executing crictl rmi to remove the test image from the node and pulling again, it still takes 10 seconds. The dfdaemon logs seem to indicate a cache hit, but the speed remains slow.
  4. Monitoring data again shows no local_peer traffic type in metrics dragonfly_scheduler_traffic. the dfdaemon log, seems to hit the local cache: image

My Questions:

  1. Why is the pull speed slow despite hitting the cache? It doesn’t differ much from pulling directly.
  2. Does the dragonfly_scheduler_traffic metric collect local_peer traffic? Or is there another metric to monitor local_peer traffic? (I noticed the Rust version of the client exposes related metrics.)

My Configuration:

version = 2
disabled_plugins = []
imports = []
oom_score = -999
required_plugins = []
root = '/cce/containerd'
state = '/run/containerd'
[debug]
  address = '/run/containerd/debug.sock'
  level = 'info'
[plugins]
  [plugins.'io.containerd.grpc.v1.cri']
    enable_selinux = false
    enable_tls_streaming = false
    max_concurrent_downloads = 10
    sandbox_image = 'registry.baidubce.com/cce-public/pause:3.1'
    stream_server_address = '127.0.0.1'
    stream_server_port = '0'
    [plugins.'io.containerd.grpc.v1.cri'.cni]
      bin_dir = '/opt/cni/bin'
      conf_dir = '/etc/cni/net.d'
      conf_template = ''
    [plugins.'io.containerd.grpc.v1.cri'.containerd]
      default_runtime_name = 'runc'
      [plugins.'io.containerd.grpc.v1.cri'.containerd.runtimes]
        [plugins.'io.containerd.grpc.v1.cri'.containerd.runtimes.runc]
          container_annotations = []
          pod_annotations = []
          privileged_without_host_devices = false
          runtime_type = 'io.containerd.runc.v2'
    [plugins."io.containerd.grpc.v1.cri".registry]
      config_path = "/etc/containerd/certs.d"

Environment:

gaius-qi commented 3 weeks ago

@jim3ma