Closed weicheng59 closed 1 year ago
可以贴下 hostfile 吗
单机,这个 ip 是根据下面的命令看到的
export NODE_ADDR=$(ifconfig -a|grep inet|grep -v 127.0.0.1|grep -v inet6|awk '{print $2;}'|sed -n '1P')
单机,这个 ip 是根据下面的命令看到的
export NODE_ADDR=$(ifconfig -a|grep inet|grep -v 127.0.0.1|grep -v inet6|awk '{print $2;}'|sed -n '1P')
好像群里反馈过,hostfile有空行?
先关闭issue,如有问题请再打开。谢谢
System Info
跑预训练,bmtrain 有这个报错 bash dist_triggerdocker.sh hostfile Aquila-pretrain.yaml aquila-7b test0 ![9201686728893 pic](https://github.com/FlagAI-Open/FlagAI/assets/8345745/68d38f2e-ff4f-46ef-a3f6-115e4848ca5a) 但是在本地尝试这个方法,是可以正常运行 本地环境, cuda 11.7,torch 1.13.1,FlagAI 1.7.1,bmtrain 0.2.2
Information
Tasks
examples
folder (such as T5/AltCLIP, ...)Reproduction
1, cd examples/Aquila bash dist_trigger_docker.sh hostfile Aquila-pretrain.yaml aquila-7b test0
2, check log file and found errors in screenshot above
Expected behavior
start pre-training