Closed hcleung3325 closed 3 years ago
MANet training doesn't take much memory. Did you turn on cal_lr_psnr
?
https://github.com/JingyunLiang/MANet/blob/34f90ba8888f4a1dd2a1127b97c2ec3706f06598/codes/options/train/train_stage1.yml#L28
MANet training doesn't take much memory. Did you turn on
cal_lr_psnr
?
Thanks for reply. No, it keeps false.
#### general settings
name: 001_MANet_aniso_x4_TMO_40_stage1
use_tb_logger: true
model: blind
distortion: sr
scale: 4
gpu_ids: [1]
kernel_size: 21
code_length: 15
# train
sig_min: 0.7 # 0.7, 0.525, 0.35 for x4, x3, x2
sig_max: 10.0 # 10, 7.5, 5 for x4, x3, x2
train_noise: False
noise_high: 15
train_jpeg: False
jpeg_low: 70
# validation
sig: 1.6
sig1: 6 # 6, 5, 4 for x4, x3, x2
sig2: 1
theta: 0
rate_iso: 0 # 1 for iso, 0 for aniso
test_noise: False
noise: 15
test_jpeg: False
jpeg: 70
pca_path: ./pca_matrix_aniso21_15_x4.pth
cal_lr_psnr: False # calculate lr psnr consumes huge memory
#### datasets
datasets:
train:
name: TMO
mode: GT
dataroot_GT: ../datasets/HR
dataroot_LQ: ~
use_shuffle: true
n_workers: 8
batch_size: 4
GT_size: 192
LR_size: ~
use_flip: true
use_rot: true
color: RGB
val:
name: Set5
mode: GT
dataroot_GT: ../../data
dataroot_LQ: ~
#### network structures
network_G:
which_model_G: MANet_s1
in_nc: 3
out_nc: ~
nf: ~
nb: ~
gc: ~
manet_nf: 128
manet_nb: 1
split: 2
#### path
path:
pretrain_model_G: ~
strict_load: true
resume_state: ~ #../experiments/001_MANet_aniso_x4_DIV2K_40_stage1/training_state/5000.state
#### training settings: learning rate scheme, loss
train:
lr_G: !!float 2e-4
lr_scheme: MultiStepLR
beta1: 0.9
beta2: 0.999
niter: 300000
warmup_iter: -1
lr_steps: [100000, 150000, 200000, 250000]
lr_gamma: 0.5
restarts: ~
restart_weights: ~
eta_min: !!float 1e-7
kernel_criterion: l1
kernel_weight: 1.0
manual_seed: 0
val_freq: !!float 2e7
#### logger
logger:
print_freq: 200
save_checkpoint_freq: !!float 2e4
It's strange because MANet is a tiny model and consumes little memory. Do you have any problems testing the model? Can you try to set manet_nf=32
in training?
It's strange because MANet is a tiny model and consumes little memory. Do you have any problems testing the model? Can you try to set
manet_nf=32
in training?
Thanks for reply. I have tried the manet_nf=32 still OOM.
is that to run
python train.py --opt options/train/train_stage1.yml
?
I think it's the problem of your GPU. Can you train other models normally? Can you test the MANet on your GPU?
My Gpu is 2080 Ti only get 11GB. Is that need a gpu with bigger ram to train it?
I don't think so. 2080 should at least be enough when manet_nf=32. Can you try to monitor the gpu usage by watch -d -n 0.5 nvidia-smi
when you start to train the model?
Thanks a lot. The problem is solved. I can run the training now.
Thanks for your code. I tried to train the model with train_stage1.yml, and the Cuda OOM. I am using 2080 Ti, I tried to reduce the batch size from 16 to 2 and the GT_size from 192 to 48. However, the training still OOM. May I know is there anything I missed? Thanks.