-
## 🚀 Feature
By using DataParallel, the engine creates a thread for each GPU. It should do this for backpropagation too, including within python-implemented autograd.Functions.
## Motivation
We'r…
-
Hi,
I have 8 GPUs that are able to load wandb agent, how can I utilize them?
In https://wandb.ai/site/articles/multi-gpu-sweeps, I have learned command `CUDA_VISIBLE_DEVICES=0 wandb agent SWEEP_ID`
…
-
Hi,
thank you for this awesome idea on this nice multi-dataset-object-detectror.
I've trained a "Partitioned detector" on coco, oid and obejcts365-V2. (for V2, because MEGVII have updated their …
-
Thanks a lot for sharing this awesome project !
I really want to have a try for the training part, but didn't find any experiment details in README or your arxiv paper. Is it possible for you to ki…
-
Dear HangZhang,
Thanks for your code! I was working on a multi-GPU version of my code, which is like:
model = CDCK2(xxx)
model = encoding.parallel.DataParallelModel(model).cuda()
The model is …
-
This is my first time using WebDataset and I have multiple shards (about 60) with a large number of images. It was working as I would expect in the normal Dataset class when I was using a single GPU. …
-
**Environment:**
1. Framework: TensorFlow
2. Framework version: 2.12.0
3. Horovod version: 0.26.1
**Bug report:**
The unit test case **_test/parallel/test_tensorflow.py::TensorFlowTests::test_h…
-
Thanks for the great repo.
I notice that the readme file only give a suggestion on training on one node with 8 gpus.
So I wonder how does the code support for multi-nodes, such as:
# For node 0
…
-
Hello,
Any plans to have a script for training XLNet on distributed GPUs?
Maybe with Horovod or MultiWorkerMirroredStrategy?
-
[7_gpu_pp1.log](https://github.com/FlagOpen/FlagScale/files/14720175/7_gpu_pp1.log)
[31_gpu_pp4.log](https://github.com/FlagOpen/FlagScale/files/14720184/31_gpu_pp4.log)
![378074e481adec4316fc6dd978…