Open zsksy123 opened 8 months ago
The following is all the logs of manager:
manager core.log core.log manager gin.log gin.log manager grpc.log grpc.log manager stderr.log stderr.log manager stdout.log stdout.log
The following is all the logs of scheduler:
core.log gc.log grpc.log job.log stderr.log stdout.log
The following is all the logs of seedPeer: core.log gin.log grpc.log stderr.log stdout.log
I filtered all the logs of dfdaemon and found no error logs
for i in `kl get pod|grep dfdaemon|awk '{print $1}'`;do echo "$i error log";kl exec -it dragonfly-dfdaemon-2zb6g cat /var/log/dragonfly/daemon/core.log|grep -i error;done
dragonfly-dfdaemon-2zb6g error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-4c5tm error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-59jpj error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-86t88 error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-b5wq2 error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-bgc78 error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-brw29 error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-bx8fx error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-c75fm error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-cffmb error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-cqvqv error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-dzwvt error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-flwkk error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-fq4rf error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-h5tg6 error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-hj2mx error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-hnxt8 error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-k2trd error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-kqxgt error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-l4v9l error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-ltpl9 error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-mfvrj error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-mlfph error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-pf6ll error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-plgt4 error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-rxxb8 error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-sswg2 error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-wlg5c error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-x6bs8 error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-xds4g error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-z6s8f error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-zhb5z error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
dragonfly-dfdaemon-zlfct error log
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "dfdaemon" out of: dfdaemon, wait-for-scheduler (init), mount-netns (init), update-docker-config (init)
@zsksy123 Seed Peer Logs:
Total number of the task's piece is 97.
{"level":"debug","ts":"2024-01-03 08:38:50.319","caller":"storage/local_storage.go:234","msg":"update total pieces: 97","task":"1240b3604ff90029979f795456360e221597801bafb1541cc733f77a1031483a","peer":"10.96.23.232-1-200cdb76-9346-40cb-84cd-1c4c0512554b_Seed","component":"localTaskStore"}
Length of the piece is 15728640.
{"level":"debug","ts":"2024-01-03 08:39:08.721","caller":"storage/local_storage.go:182","msg":"wrote 15728640 bytes to file /var/lib/dragonfly/1240b3604ff90029979f795456360e221597801bafb1541cc733f77a1031483a/10.96.23.232-1-200cdb76-9346-40cb-84cd-1c4c0512554b_Seed/data, piece 1, start 15728640, length: 15728640","task":"1240b3604ff90029979f795456360e221597801bafb1541cc733f77a1031483a","peer":"10.96.23.232-1-200cdb76-9346-40cb-84cd-1c4c0512554b_Seed","component":"localTaskStore"}
When the 68 piece writes 7434240 bytes, return the unexpected EOF
error.
{"level":"error","ts":"2024-01-03 08:48:36.544","caller":"peer/piece_manager.go:292","msg":"put piece to storage failed, piece num: 68, wrote: 7434240, error: unexpected EOF","peer":"10.96.23.232-1-200cdb76-9346-40cb-84cd-1c4c0512554b_Seed","task":"1240b3604ff90029979f795456360e221597801bafb1541cc733f77a1031483a","component":"PeerTask","trace":"f1934cc823a9f6a835ed7d4deb7e5f78","stacktrace":"d7y.io/dragonfly/v2/client/daemon/peer.(*pieceManager).processPieceFromSource\n\t/go/src/d7y.io/dragonfly/v2/client/daemon/peer/piece_manager.go:292\nd7y.io/dragonfly/v2/client/daemon/peer.(*pieceManager).downloadKnownLengthSource\n\t/go/src/d7y.io/dragonfly/v2/client/daemon/peer/piece_manager.go:490\nd7y.io/dragonfly/v2/client/daemon/peer.(*pieceManager).DownloadSource\n\t/go/src/d7y.io/dragonfly/v2/client/daemon/peer/piece_manager.go:475\nd7y.io/dragonfly/v2/client/daemon/peer.(*peerTaskConductor).backSource\n\t/go/src/d7y.io/dragonfly/v2/client/daemon/peer/peertask_conductor.go:505\nd7y.io/dragonfly/v2/client/daemon/peer.(*peerTaskConductor).pullPieces\n\t/go/src/d7y.io/dragonfly/v2/client/daemon/peer/peertask_conductor.go:527"}
When downloading the 68 piece, the http range request was interrupted, which affected writing. Please check why the server connection was interrupted.
@gaius-qi @jim3ma Manually pull the image using the docker pull command and you will find that there is a 2.3G layer that will retry the pull. However, it will eventually succeed. If the uhub repository connection is lost, can we handle this situation on our side of the code?
@zsksy123 Can you add support for the feature? Thanks.
image name: uhub.service.ucloud.cn/openbayes_algopub/inference_llm:0.0.2 preheating the large image failed,Job details are as follows: