-
Hi,
I am trying to launch `dinov2/train/train.py` script directly without the Slurm scheduler. I use the following command to launch the training:
```
export CUDA_VISIBLE_DEVICES=0,1 && python dino…
-
When your training script utilizes DDP to run on single or multiple nodes, it will spawn multiple processes; each will run on a different GPU. Every process needs to know how many other processes are …
-
### Problem
It would be nice to see some more information about resource utilization.
### Solution
- [ ] Individual resources used of each instance on each node
- [ ] Overall available resources…
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [ ] 2. The bug has not been fixed in the latest version.
- [ ] 3. Please note that if the bug-related issue y…
-
After being able to start two cameras as nodes (#119) I want to start the two camera nodes from a launch file as ```ComposableNode``` to allow inter-process communication.
Running the following lau…
-
Hello @tdrussell,
First of all, thank you very much for your great repo! It is absolutely great work to pull all these optimization solutions together.
When I use the repo, I try to use the bf16 e…
-
hi bro
I hope u be as well as always
What is the maximum bit rate can be achieved by your UART IP?
is it possible to reach the speed between 16mbps to 20mbps? what are the conditions or constrain…
-
Hi, I observe the following system metrics:
![image](https://github.com/lucidrains/imagen-pytorch/assets/44346535/59d50dfb-b22a-4e26-9515-ed8ef838f5c8)
As I expect the GPU to be highly utilized …
-
Hi, First I would like to thank the contributors for providing such an elegant and easy-to-go library to profile MPI programs.
MY problem:
I built a mpi cluster within a lan with up to 8 devices (L…
-
The default setting in BigDL partition model and dataset with the same number. I am wondering is it possible to set the number of model partitions different from dataset RDD partition (i.e. only 1 par…