cschenxiang / DRSformer

Learning A Sparse Transformer Network for Effective Image Deraining (CVPR 2023)
247 stars 14 forks source link

GPU not utilized during trining. #22

Closed liu-bohan closed 4 months ago

liu-bohan commented 4 months ago

Thank you for your excellent work!

I'm currently having problems during my training process. Somehow, the GPU is not utilized; instead, the CPU works for training.

Since I only have one GPU in my machine, I made a few changes to train.sh to remove distributed computation:

#!/usr/bin/env bash

#CONFIG=$1

export NCCL_P2P_DISABLE=1

python setup.py develop --no_cuda_ext

# CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --master_port=4321 basicsr/train.py -opt Options/Deraining.yml --launcher pytorch

CUDA_VISIBLE_DEVICES=0 python basicsr/train.py -opt Options/Deraining.yml

After compiling with bash train.sh , it showed that the only GPU is recognized

2024-04-26 02:54:51,734 INFO: Training statistics:
Number of train images: 3
Dataset enlarge ratio: 1
Batch size per gpu: 8
World size (gpu number): 1
Require iter number per epoch: 1
Total epochs: 300000; iters: 300000.
2024-04-26 02:54:51,734 INFO: Dataset Dataset_PairedImage - ValSet is created.
2024-04-26 02:54:51,734 INFO: Number of val images/folders in ValSet: 3

Any help is appreciated!