dragonflyoss / Dragonfly

This repository has be archived and moved to the new repository https://github.com/dragonflyoss/Dragonfly2.
https://d7y.io
Apache License 2.0
6k stars 774 forks source link

dragonfly is unstable with high latency #1047

Open zhm9484 opened 4 years ago

zhm9484 commented 4 years ago

Ⅰ. Issue Description

I tested dragonfly in a kubernetes cluster of 300 nodes which were 2c4g vms. The purpose of the test was to see the stability of dragonfly in distributing files across multiple nodes. It turned out that dragonfly became unstable after running a while. The testing procedure is as follows:

  1. Supernode starts with arg "--max-bandwidth 10000000MB".
  2. Every node has a downloader with dfget on it. Every downloader watches a key called "/new/file" in etcd.
  3. A container uses "python3 -m http.server" to serve http requests. It also generates new random file which is 50MB and modifies the value of key "/new/file" in etcd every 30 seconds.
  4. Downloaders get the etcd events simultaneously, then trigger the "dfget --url= --notbs --locallimit=10G --totallimit=10G --node= -o " command.

Ⅱ. Describe what happened

Everything seems fine at the beginning. But about 50 minutes later, the metrics became unstable. Dfget failures started to happen, and the download latency increased. image

Ⅲ. Describe what you expected to happen

For the nodes in cluster were stable, the metrics should be stable too.

Ⅳ. How to reproduce it (as minimally and precisely as possible)

Ⅴ. Anything else we need to know?

I can provide the logs of supernode and dfclient if you need.

Ⅵ. Environment:

lowzj commented 4 years ago

Could you paste related logs of dfget? see FAQ to get logs.

zhm9484 commented 4 years ago

Could you paste related logs of dfget? see FAQ to get logs.

I can only find some logs like this.

2019-11-04 15:23:54.407 INFO sign:347-1572852234.407 : get cmd params:["dfget" "--url=http://127.0.0.1:12347/27.model" "--notbs" "--locallimit=10G" "--totallimit=10G" "--node=dfsupernode:8002" "-o" "/save/27.model"]
2019-11-04 15:23:54.407 INFO sign:347-1572852234.407 : get init config:{"url":"http://127.0.0.1:12347/27.model","output":"/save/27.model","pattern":"p2p","notbs":true,"nodes":["dfsupernode:8002"],"localLimit":"10GB","minRate":"64KB","totalLimit":"10GB","clientQueueSize":6}
2019-11-04 15:23:54.407 INFO sign:347-1572852234.407 : target file path:/save/27.model
2019-11-04 15:23:55.407 ERRO sign:347-1572852234.407 : Connect to node:dfsupernode:8002 error: dial tcp: i/o timeout
2019-11-04 15:23:55.414 INFO sign:347-1572852234.407 : runtimeVariable: {"MetaPath":"/root/.small-dragonfly/meta/host.meta","SystemDataDir":"/root/.small-dragonfly/data","DataDir":"/root/.small-dragonfly/data","RealTarget":"/save/27.model","TargetDir":"/save","TempTarget":"/save/dfget-347-1572852234.407.tmp-086707113","Cid":"172.22.146.145-347-1572852234.407","TaskURL":"http://127.0.0.1:12347/27.model","TaskFileName":"27.model-347-1572852234.407","LocalIP":"172.22.146.145","PeerPort":0,"FileLength":-1,"DataExpireTime":180000000000,"ServerAliveTime":300000000000}
2019-11-04 15:23:55.415 INFO sign:347-1572852234.407 : local http result:27.model-347-1572852234.407 err:<nil>, port:57838 path:/check/
2019-11-04 15:23:55.415 INFO sign:347-1572852234.407 : use peer server on port:57838
2019-11-04 15:23:55.415 INFO sign:347-1572852234.407 : do register to one of [dfsupernode:8002 dfsupernode:8002]
2019-11-04 15:24:00.415 INFO sign:347-1572852234.407 : do register to dfsupernode:8002, res:null error:timeout
2019-11-04 15:24:00.415 ERRO sign:347-1572852234.407 : register to node:dfsupernode:8002 error:timeout
2019-11-04 15:24:05.415 INFO sign:347-1572852234.407 : do register to dfsupernode:8002, res:null error:timeout
2019-11-04 15:24:05.415 ERRO sign:347-1572852234.407 : register to node:dfsupernode:8002 error:timeout
2019-11-04 15:24:05.415 ERRO sign:347-1572852234.407 : register fail:{"Code":-100,"Msg":"timeout"}
2019-11-04 15:24:05.415 WARN sign:347-1572852234.407 : register fail but try to download from source, reason:1({"Code":-100,"Msg":"timeout"})
2019-11-04 15:24:05.415 WARN sign:347-1572852234.407 : invalid download timeout(0.000s)
2019-11-04 15:24:05.415 INFO sign:347-1572852234.407 : download FAIL from supernode [] cost:11.008s length:-1 reason:1001

But what bothers me is the high latency which was about 30s. And the logs are like this:

2019-11-04 15:24:24.533 INFO sign:357-1572852264.534 : get cmd params:["dfget" "--url=http://127.0.0.1:12347/28.model" "--notbs" "--locallimit=10G" "--totallimit=10G" "--node=dfsupernode:8002" "-o" "/save/28.model"]
2019-11-04 15:24:24.534 INFO sign:357-1572852264.534 : get init config:{"url":"http://127.0.0.1:12347/28.model","output":"/save/28.model","pattern":"p2p","notbs":true,"nodes":["dfsupernode:8002"],"localLimit":"10GB","minRate":"64KB","totalLimit":"10GB","clientQueueSize":6}
2019-11-04 15:24:24.534 INFO sign:357-1572852264.534 : target file path:/save/28.model
2019-11-04 15:24:24.544 INFO sign:357-1572852264.534 : runtimeVariable: {"MetaPath":"/root/.small-dragonfly/meta/host.meta","SystemDataDir":"/root/.small-dragonfly/data","DataDir":"/root/.small-dragonfly/data","RealTarget":"/save/28.model","TargetDir":"/save","TempTarget":"/save/dfget-357-1572852264.534.tmp-287104021","Cid":"172.22.146.145-357-1572852264.534","TaskURL":"http://127.0.0.1:12347/28.model","TaskFileName":"28.model-357-1572852264.534","LocalIP":"172.22.146.145","PeerPort":0,"FileLength":-1,"DataExpireTime":180000000000,"ServerAliveTime":300000000000}
2019-11-04 15:24:24.545 INFO sign:357-1572852264.534 : local http result:28.model-357-1572852264.534 err:<nil>, port:57838 path:/check/
2019-11-04 15:24:24.545 INFO sign:357-1572852264.534 : use peer server on port:57838
2019-11-04 15:24:24.545 INFO sign:357-1572852264.534 : do register to one of [dfsupernode:8002 dfsupernode:8002]
2019-11-04 15:24:24.589 INFO sign:357-1572852264.534 : do register to dfsupernode:8002, res:{"code":200,"msg":"success","data":{"taskId":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","fileLength":52428802,"pieceSize":4194304}} error:<nil>
2019-11-04 15:24:24.589 INFO sign:357-1572852264.534 : do register result:{"code":200,"msg":"success","data":{"taskId":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","fileLength":52428802,"pieceSize":4194304}} and cost:0.044s
2019-11-04 15:24:24.589 INFO sign:357-1572852264.534 : downloading piece:{"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"","range":"","result":502,"status":700,"pieceSize":0,"pieceNum":0}
2019-11-04 15:24:24.595 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"","range":"","result":502,"status":700,"pieceSize":0,"pieceNum":0}) result:{"code":602,"msg":"clientSucCount:0,cdnSucCount:0: {\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 0.064s
2019-11-04 15:24:24.666 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"","range":"","result":502,"status":700,"pieceSize":0,"pieceNum":0}) result:{"code":602,"msg":"clientSucCount:0,cdnSucCount:0: {\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 0.157s
2019-11-04 15:24:24.825 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"","range":"","result":502,"status":700,"pieceSize":0,"pieceNum":0}) result:{"code":602,"msg":"clientSucCount:0,cdnSucCount:0: {\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 0.351s
2019-11-04 15:24:25.178 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"","range":"","result":502,"status":700,"pieceSize":0,"pieceNum":0}) result:{"code":602,"msg":"clientSucCount:0,cdnSucCount:0: {\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 0.401s
2019-11-04 15:24:25.583 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"","range":"","result":502,"status":700,"pieceSize":0,"pieceNum":0}) result:{"code":602,"msg":"clientSucCount:0,cdnSucCount:0: {\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 1.152s
2019-11-04 15:24:26.737 ERRO sign:357-1572852264.534 : pull piece task fail:{"code":502,"msg":"failed to get dfgetTask with taskID (93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e) clientID (172.22.146.145-357-1572852264.534): failed to get key 172.22.146.145-357-1572852264.534@93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e from map: {\"Code\":0,\"Msg\":\"data not found\"}"} and will migrate
2019-11-04 15:24:26.737 INFO sign:357-1572852264.534 : do register to one of [dfsupernode:8002]
2019-11-04 15:24:26.739 INFO sign:357-1572852264.534 : do register to dfsupernode:8002, res:{"code":200,"msg":"success","data":{"taskId":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","fileLength":52428802,"pieceSize":4194304}} error:<nil>
2019-11-04 15:24:26.739 INFO sign:357-1572852264.534 : do register result:{"code":200,"msg":"success","data":{"taskId":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","fileLength":52428802,"pieceSize":4194304}} and cost:0.002s
2019-11-04 15:24:26.741 INFO sign:357-1572852264.534 : pull rate result:10737418240 cost:184.018µs
2019-11-04 15:24:26.808 INFO sign:357-1572852264.534 : downloading piece:{"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.210-342-1572852264.503","range":"0-4194303","result":503,"status":701,"pieceSize":4194304,"pieceNum":0}
2019-11-04 15:24:26.824 WARN sign:357-1572852264.534 : request piece result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"}
2019-11-04 15:24:26.824 INFO sign:357-1572852264.534 : downloading piece:{"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.146.70-341-1572852264.555","range":"4194304-8388607","result":503,"status":701,"pieceSize":4194304,"pieceNum":1}
2019-11-04 15:24:26.826 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.146.70-341-1572852264.555","range":"4194304-8388607","result":503,"status":701,"pieceSize":4194304,"pieceNum":1}) result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 0.914s
2019-11-04 15:24:27.742 WARN sign:357-1572852264.534 : request piece result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"}
2019-11-04 15:24:27.742 INFO sign:357-1572852264.534 : downloading piece:{"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.217-346-1572852264.531","range":"12582912-16777215","result":503,"status":701,"pieceSize":4194304,"pieceNum":3}
2019-11-04 15:24:27.743 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.217-346-1572852264.531","range":"12582912-16777215","result":503,"status":701,"pieceSize":4194304,"pieceNum":3}) result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 1.545s
2019-11-04 15:24:29.290 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.217-346-1572852264.531","range":"12582912-16777215","result":503,"status":701,"pieceSize":4194304,"pieceNum":3}) result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 1.033s
2019-11-04 15:24:30.325 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.217-346-1572852264.531","range":"12582912-16777215","result":503,"status":701,"pieceSize":4194304,"pieceNum":3}) result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 1.096s
2019-11-04 15:24:31.423 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.217-346-1572852264.531","range":"12582912-16777215","result":503,"status":701,"pieceSize":4194304,"pieceNum":3}) result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 1.101s
2019-11-04 15:24:32.526 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.217-346-1572852264.531","range":"12582912-16777215","result":503,"status":701,"pieceSize":4194304,"pieceNum":3}) result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 1.520s
2019-11-04 15:24:34.048 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.217-346-1572852264.531","range":"12582912-16777215","result":503,"status":701,"pieceSize":4194304,"pieceNum":3}) result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 0.828s
2019-11-04 15:24:34.878 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.217-346-1572852264.531","range":"12582912-16777215","result":503,"status":701,"pieceSize":4194304,"pieceNum":3}) result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 0.976s
2019-11-04 15:24:35.856 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.217-346-1572852264.531","range":"12582912-16777215","result":503,"status":701,"pieceSize":4194304,"pieceNum":3}) result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 1.357s
2019-11-04 15:24:37.215 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.217-346-1572852264.531","range":"12582912-16777215","result":503,"status":701,"pieceSize":4194304,"pieceNum":3}) result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 1.233s
2019-11-04 15:24:38.450 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.217-346-1572852264.531","range":"12582912-16777215","result":503,"status":701,"pieceSize":4194304,"pieceNum":3}) result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 1.054s
2019-11-04 15:24:39.506 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.217-346-1572852264.531","range":"12582912-16777215","result":503,"status":701,"pieceSize":4194304,"pieceNum":3}) result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 1.185s
2019-11-04 15:24:40.693 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.217-346-1572852264.531","range":"12582912-16777215","result":503,"status":701,"pieceSize":4194304,"pieceNum":3}) result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 1.575s
2019-11-04 15:24:42.270 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.217-346-1572852264.531","range":"12582912-16777215","result":503,"status":701,"pieceSize":4194304,"pieceNum":3}) result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 1.466s
2019-11-04 15:24:43.738 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.217-346-1572852264.531","range":"12582912-16777215","result":503,"status":701,"pieceSize":4194304,"pieceNum":3}) result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 1.433s
2019-11-04 15:24:45.173 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.217-346-1572852264.531","range":"12582912-16777215","result":503,"status":701,"pieceSize":4194304,"pieceNum":3}) result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 1.246s
2019-11-04 15:24:46.421 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.217-346-1572852264.531","range":"12582912-16777215","result":503,"status":701,"pieceSize":4194304,"pieceNum":3}) result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 1.247s
2019-11-04 15:24:47.670 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.217-346-1572852264.531","range":"12582912-16777215","result":503,"status":701,"pieceSize":4194304,"pieceNum":3}) result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 0.890s
2019-11-04 15:24:48.562 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.217-346-1572852264.531","range":"12582912-16777215","result":503,"status":701,"pieceSize":4194304,"pieceNum":3}) result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 1.199s
2019-11-04 15:24:49.763 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.217-346-1572852264.531","range":"12582912-16777215","result":503,"status":701,"pieceSize":4194304,"pieceNum":3}) result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 1.278s
2019-11-04 15:24:51.069 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.217-346-1572852264.531","range":"12582912-16777215","result":503,"status":701,"pieceSize":4194304,"pieceNum":3}) result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 1.070s
2019-11-04 15:24:52.141 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.217-346-1572852264.531","range":"12582912-16777215","result":503,"status":701,"pieceSize":4194304,"pieceNum":3}) result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 0.875s
2019-11-04 15:24:53.018 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.217-346-1572852264.531","range":"12582912-16777215","result":503,"status":701,"pieceSize":4194304,"pieceNum":3}) result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 1.540s
2019-11-04 15:24:54.560 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.217-346-1572852264.531","range":"12582912-16777215","result":503,"status":701,"pieceSize":4194304,"pieceNum":3}) result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 1.524s
2019-11-04 15:24:56.086 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.145.217-346-1572852264.531","range":"12582912-16777215","result":503,"status":701,"pieceSize":4194304,"pieceNum":3}) result:{"code":602,"msg":"{\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 1.017s
2019-11-04 15:24:57.106 INFO sign:357-1572852264.534 : pull rate result:10737418240 cost:300.566µs
2019-11-04 15:24:57.161 INFO sign:357-1572852264.534 : downloading piece:{"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.146.115-337-1572852264.531","range":"46137344-50331647","result":503,"status":701,"pieceSize":4194304,"pieceNum":11}
2019-11-04 15:24:57.175 INFO sign:357-1572852264.534 : downloading piece:{"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.146.136-343-1572852264.522","range":"37748736-41943039","result":503,"status":701,"pieceSize":4194304,"pieceNum":9}
2019-11-04 15:24:57.211 INFO sign:357-1572852264.534 : downloading piece:{"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.146.150-340-1572852264.543","range":"29360128-33554431","result":503,"status":701,"pieceSize":4194304,"pieceNum":7}
2019-11-04 15:24:57.218 INFO sign:357-1572852264.534 : downloading piece:{"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.146.138-338-1572852264.532","range":"20971520-25165823","result":503,"status":701,"pieceSize":4194304,"pieceNum":5}
2019-11-04 15:24:57.235 INFO sign:357-1572852264.534 : pull piece task({"taskID":"93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e","superNode":"dfsupernode:8002","dstCid":"172.22.146.138-338-1572852264.532","range":"20971520-25165823","result":503,"status":701,"pieceSize":4194304,"pieceNum":5}) result:{"code":602,"msg":"taskID(93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e) clientID(172.22.146.145-357-1572852264.534): {\"Code\":9,\"Msg\":\"peer should wait\"}"} and sleep 1.468s
2019-11-04 15:24:58.705 INFO sign:357-1572852264.534 : remaining piece to be written count:0
2019-11-04 15:24:59.016 INFO sign:357-1572852264.534 : wait client writer finish cost:0.311,main qu size:2,client qu size:0
2019-11-04 15:24:59.016 INFO sign:357-1572852264.534 : move src:/root/.small-dragonfly/data/28.model-357-1572852264.534 to dst:/save/28.model result:true cost:0.000
2019-11-04 15:24:59.016 INFO sign:357-1572852264.534 : download successfully from dragonfly
2019-11-04 15:24:59.016 INFO sign:357-1572852264.534 : download SUCCESS from supernode [] cost:34.483s length:52428802
lowzj commented 4 years ago

It seems that the supernode at that time didn't work fine or the bandwidth of the supernode was used out. Could you paste the log and the io/network details of supernode at that time?

grep the taskID 93fa39eff4af850e15eeaeab0c8ed8d147e8d9a69a7d024ef3bc1afd76c1e68e to get the related log.