-
Hi,
First of all thanks for the great library. I'm trying to train a model, specifically pointnet2 charlesmsg, and in order to use a higher batch size I would want to train the model in two GPUs. I…
-
It would be useful for the RAPIDS effort to have a multi-node join computation deployed from Kubernetes. Until UCX arrives this will likely be slow, but we can probably work on deployment and configu…
-
Hello,
I came across your work, and was wondering whether loading and training models on multiple GPUs was possible.
I saw in the YOLOv7 repo that it was possible with the following command line…
-
Hi, we've found a few problems in recently tasks.
It seems the code shown in files don't have any details.
The --data_dir in OFrecord running code is so confusing that we must set the path by oursel…
-
Hi Zhang,
Thank you for sharing your nice work. We found your provided docres.pkl achieving promising results on Doc3D dataset. Then we try to train a new DocRes model on Doc3D based on the same set…
-
I saw error message when I am trying to do supervised fine tuning with 4xA100 GPUs. So the free version cannot be used on multiple GPUs?
RuntimeError: Error: More than 1 GPUs have a lot of VRAM usa…
-
In multi GPU systems it can take quite a while to initialize all GPU's. So benchmaking tasks with frameworks like hashtopolis can take quite a chunk of time.
I would propose benchmark with only one…
-
When tuning is active during Multi-GPU runs each GPU independently tunes each Kernel. This results in different GPUs using different launch configurations for the final Kernel launch and finally makes…
-
**[Original report](https://bitbucket.org/icldistcomp/parsec/issue/242) by Qinglei Cao (Bitbucket: [Qinglei_Cao](https://bitbucket.org/Qinglei_Cao), ).**
----------------------------------------
If …
-
I am attempting to train a model using the tools/dist_train.sh script as documented, but I'm encountering several errors and warnings during execution. Below is the command I used and the correspondin…