-
Hi! I really appreciate your work! When I run your multi-gpu code, I met the following problem. It looks like some layers are in different device. Could you please help me with that?
```
Traceback…
-
I have adapted this using simple Data-Parallel from Pytorch, but the model seems to output ``nans sometimes. Have you been able to train this across multiple GPUs on a single node?
-
How to run training on multi gpu? As I can see training runs on single gpu.
-
when i try to run the train_caption.py script like this:
```
export Data_ROOT=path/to/coco_dataset
python train_caption.py exp.name=caption_rds moel.detector.checkpoint=4ds_detector_path
```
i en…
-
Hi, Multi-GPU training has been consistently failing. Would it be possible to provide a screenshot of 'pip list' to see the version of each package installed, or if there is an environment image file …
-
When can we expect code update for GPU-based pre-training and fine-tuning instead of TPU? @ellisbrown @penghao-wu @tsb0601
-
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model par…
-
### 🐛 Describe the bug
```
[2024-05-27 08:06:37] INFO - sg_trainer.py - Started training for 300 epochs (0/299)
Train epoch 0: 0%| | 0/4690 [00:02
-
Hi, am trying to use multi-GPU training using kaggle with two Tesla T4.
my code only runs on 1 GPU, the other are not utilized.
I am able to train with custom dataset and getting acceptable results…
Ayadx updated
4 months ago
-
I was trying to run training on multiple GPU servers in AWS, but it is not training as expected. Is there a way to enable this?