-
# The issue
Given a specific checkpoint, load it in two different settings:
1. Load it with 64 nodes, 512 GPUs, 512 processes (1 GPU / process).
2. Load it with 64 nodes, 512 GPUs, 64 processes…
-
Hello,
I have setup a remote host, which I want to connect to from another (local) computer.
I have correctly setup ssh acccess.
My hostfile.txt contains only the IP of the remote host:
10.1.1…
-
### The purpose and use-cases of the new component
The DNS lookup Processor is for resolving hostnames to IP addresses and vice versa. It is particularly useful when the GeoIP processor receives a …
-
请问在hetero模式下是不是必须要enable_hetero:True?显卡数量是不是要大于4?hostfile是不是必须需要?期待大家耐心回答!
-
#! /bin/bash
NUM_WORKERS=1
NUM_GPUS_PER_WORKER=1
MP_SIZE=1
script_path=$(realpath $0)
script_dir=$(dirname $script_path)
main_dir=$(dirname $script_dir)
MODEL_TYPE="XrayGLM"
MODEL_ARGS="--ma…
-
Hi, I am using LSF on LLNL's Lassen cluster.
Link is https://hpc.llnl.gov/hardware/compute-platforms/lassen.
The script seems to require the environment variable LSB_AFFINITY_HOSTFILE to be set.
…
-
Hi,
Would you be open to supporting linux host files as part of this? It would be great to extend this to support host files which is missing from `dns.resolve`.
Cheers,
Andrew
-
Good afternoon all
`glogin` (GWDG Emmy) has undergone some hardware and software upgrades recently. Since the upgrade, I find jobs launched with `srun` are considerably slower than jobs launched w…
-
请问
在这个实例中,
ZeRO/DeepSpeedExamples/mnist/run_ds.sh
传递的参数--deepspeed 和 --deepspeed_config 都没有被解析吧
-
**Describe the bug**
I am trying to launch multiple Megatron-DeepSpeed jobs on a slurm based cluster. For each job, I want to create a different hostfile called hostfile_${SLURM_JOBID}. However, when…