hostfile Search Results

1000+ results
for hostfile

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

ROCm/DeepSpeed #68

[BUG] I have pulled the docker images,but when I run it ,I g…

susie.sun@yz-amd1:~$ docker run -it rocm/deepspeed:rocm5.7_ubuntu20.04_py3.9_pytorch_2.0.1_DeepSpeed /bin/bash root@c50e90963e1a:/var/lib/jenkins# deepspeed --num_gpus 1 deploy.py [2023-12-14 01:52:…

sunpian1 updated 6 months ago
17
open-mpi/ompi #7106

mpiexec not allowing more cpus per process in more than one …

Hi, I'm new in github and MPI (mpiexec) usages, so I try to run a process that can run in more than one thread. So, I used hwthreads. But, the problem is that hwthread is just limited to one node, …

rafaeltiveron updated 5 years ago
1
THUDM/CodeGeeX #142

scripts/finetune_codegeex.sh中的HOSTFILE是什么概念和DATA_PATH中的数据源有没…

HOSTFILE="" # HOSTFILE不太理解是什么意思 # ====== Parameters ====== DATA_PATH="" # 这个数据源有没有开源数据集

wangyang135 updated 1 year ago
2
2ndalpha/gasmask #171

Order alphabeticly

I have like 30-40 hostfiles and ran a script to create files for all hostfiles and the restarted gasmask. Then it read the files seemingly random. So it would be good if there was a sort function or t…

spyvingen updated 5 years ago
1
huggingface/optimum-habana #1451

Meta-Llama-3 model text-generation example output is unexpec…

### System Info ```shell deepspeed 0.14.4+hpu.synapse.v1.18.0 optimum-habana 1.14.0 docker image: vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-ins…

aslanxie updated 1 month ago
4
microsoft/PrimeDNS #2

HostfileUpdater: Should Update only when there is a change t…

Currently hostfile updater updates every 60 seconds (or fixed time period) which is not a good design given touching the hostfile clears the local DNS caches. Hence, we would like to make it update on…

Arunothia updated 5 years ago
1
microsoft/DeepSpeed #1331

CUDA_VISIBLE_DEVICES isn't correctly inherited on a SLURM sy…

**Describe the bug** This issue occurs on a SLURM cluster where worker nodes equipped with multiple GPU's are shared amongst users. GPU's are given slot number assignments (for example, on a node wit…

devinrouthuzh updated 7 months ago
8
microsoft/DeepSpeed #4387

How to train inside multiple nodes' Docker containers?

**Describe the bug** **Log output** After configuring the hostfile using pdsh, I use command `deepspeed --num_nodes 2 hostfile=hostfile.txt train.py`,But I find deepspeed login into other machine …

chenfengshijie updated 1 year ago
5
haotian-liu/LLaVA #362

When use multi-nodes with zero3, training time increase

### Describe the issue Issue: We collect a large-scale instruction dataset, and want to use muti-nodes training. When using the following script, the traing time is too slow and no log about time. …

Byshev333 updated 1 year ago
2
access-ci-org/Jetstream_Cluster #13

Problem running multi-node MPI jobs

Hello, I'm hitting problems running MPI jobs that require more than one node using the system install of OpenMPI 4.1.1 in Rocky 8.6. Specifically, the following script runs on a single 2 core `m3…

cwsmith updated 1 year ago
2

上一页 1...4 5 6 7 8 9 10...100 下一页

1000+ results for hostfile

1000+ results
for hostfile