Closed zapplelove closed 2 years ago
I just copy some image files to the folder. and the training is running, but it stopped just after a few seconds. the training log is:
21-04-02 13:02:02.176 : task: srmd
model: plain
gpu_ids: [0]
scale: 4
n_channels: 3
sigma: [0, 50]
sigma_test: 0
merge_bn: False
merge_bn_startpoint: 400000
path:[
root: superresolution
pretrained_netG: None
task: superresolution/srmd
log: superresolution/srmd
options: superresolution/srmd/options
models: superresolution/srmd/models
images: superresolution/srmd/images
]
datasets:[
train:[
name: train_dataset
dataset_type: srmd
dataroot_H: trainsets/trainH
dataroot_L: None
H_size: 96
dataloader_shuffle: True
dataloader_num_workers: 8
dataloader_batch_size: 64
phase: train
scale: 4
n_channels: 3
]
test:[
name: test_dataset
dataset_type: srmd
dataroot_H: testsets/set5
dataroot_L: None
phase: test
scale: 4
n_channels: 3
]
]
netG:[
net_type: srmd
in_nc: 19
out_nc: 3
nc: 128
nb: 12
gc: 32
ng: 2
reduction: 16
act_mode: R
upsample_mode: pixelshuffle
downsample_mode: strideconv
init_type: orthogonal
init_bn_type: uniform
init_gain: 0.2
scale: 4
]
train:[
G_lossfn_type: l1
G_lossfn_weight: 1.0
G_optimizer_type: adam
G_optimizer_lr: 0.0001
G_optimizer_clipgrad: None
G_scheduler_type: MultiStepLR
G_scheduler_milestones: [200000, 400000, 600000, 800000, 1000000, 2000000]
G_scheduler_gamma: 0.5
G_regularizer_orthstep: None
G_regularizer_clipstep: None
checkpoint_test: 5000
checkpoint_save: 5000
checkpoint_print: 200
]
opt_path: options/train_srmd.json
is_train: True
21-04-02 13:02:02.176 : calculating PCA projection matrix...
21-04-02 13:04:02.431 : done!
21-04-02 13:04:02.431 : Random seed: 3448
21-04-02 13:05:21.938 : task: srmd
model: plain
gpu_ids: [0]
scale: 4
n_channels: 3
sigma: [0, 50]
sigma_test: 0
merge_bn: False
merge_bn_startpoint: 400000
path:[
root: superresolution
pretrained_netG: None
task: superresolution/srmd
log: superresolution/srmd
options: superresolution/srmd/options
models: superresolution/srmd/models
images: superresolution/srmd/images
]
datasets:[
train:[
name: train_dataset
dataset_type: srmd
dataroot_H: trainsets/trainH
dataroot_L: None
H_size: 96
dataloader_shuffle: True
dataloader_num_workers: 8
dataloader_batch_size: 64
phase: train
scale: 4
n_channels: 3
]
test:[
name: test_dataset
dataset_type: srmd
dataroot_H: testsets/set5
dataroot_L: None
phase: test
scale: 4
n_channels: 3
]
]
netG:[
net_type: srmd
in_nc: 19
out_nc: 3
nc: 128
nb: 12
gc: 32
ng: 2
reduction: 16
act_mode: R
upsample_mode: pixelshuffle
downsample_mode: strideconv
init_type: orthogonal
init_bn_type: uniform
init_gain: 0.2
scale: 4
]
train:[
G_lossfn_type: l1
G_lossfn_weight: 1.0
G_optimizer_type: adam
G_optimizer_lr: 0.0001
G_optimizer_clipgrad: None
G_scheduler_type: MultiStepLR
G_scheduler_milestones: [200000, 400000, 600000, 800000, 1000000, 2000000]
G_scheduler_gamma: 0.5
G_regularizer_orthstep: None
G_regularizer_clipstep: None
checkpoint_test: 5000
checkpoint_save: 5000
checkpoint_print: 200
]
opt_path: options/train_srmd.json
is_train: True
21-04-02 13:05:21.938 : loading PCA projection matrix...
21-04-02 13:05:21.938 : Random seed: 2573
21-04-02 13:16:42.167 : task: srmd
model: plain
gpu_ids: [0]
scale: 4
n_channels: 3
sigma: [0, 50]
sigma_test: 0
merge_bn: False
merge_bn_startpoint: 400000
path:[
root: superresolution
pretrained_netG: None
task: superresolution/srmd
log: superresolution/srmd
options: superresolution/srmd/options
models: superresolution/srmd/models
images: superresolution/srmd/images
]
datasets:[
train:[
name: train_dataset
dataset_type: srmd
dataroot_H: trainsets/trainH
dataroot_L: None
H_size: 96
dataloader_shuffle: True
dataloader_num_workers: 8
dataloader_batch_size: 64
phase: train
scale: 4
n_channels: 3
]
test:[
name: test_dataset
dataset_type: srmd
dataroot_H: testsets/set5
dataroot_L: None
phase: test
scale: 4
n_channels: 3
]
]
netG:[
net_type: srmd
in_nc: 19
out_nc: 3
nc: 128
nb: 12
gc: 32
ng: 2
reduction: 16
act_mode: R
upsample_mode: pixelshuffle
downsample_mode: strideconv
init_type: orthogonal
init_bn_type: uniform
init_gain: 0.2
scale: 4
]
train:[
G_lossfn_type: l1
G_lossfn_weight: 1.0
G_optimizer_type: adam
G_optimizer_lr: 0.0001
G_optimizer_clipgrad: None
G_scheduler_type: MultiStepLR
G_scheduler_milestones: [200000, 400000, 600000, 800000, 1000000, 2000000]
G_scheduler_gamma: 0.5
G_regularizer_orthstep: None
G_regularizer_clipstep: None
checkpoint_test: 5000
checkpoint_save: 5000
checkpoint_print: 200
]
opt_path: options/train_srmd.json
is_train: True
21-04-02 13:16:42.168 : loading PCA projection matrix...
21-04-02 13:16:42.168 : Random seed: 9255
21-04-02 13:43:07.746 : task: srmd
model: plain
gpu_ids: [0]
scale: 4
n_channels: 3
sigma: [0, 50]
sigma_test: 0
merge_bn: False
merge_bn_startpoint: 400000
path:[
root: superresolution
pretrained_netG: None
task: superresolution/srmd
log: superresolution/srmd
options: superresolution/srmd/options
models: superresolution/srmd/models
images: superresolution/srmd/images
]
datasets:[
train:[
name: train_dataset
dataset_type: srmd
dataroot_H: trainsets/trainH
dataroot_L: None
H_size: 96
dataloader_shuffle: True
dataloader_num_workers: 8
dataloader_batch_size: 64
phase: train
scale: 4
n_channels: 3
]
test:[
name: test_dataset
dataset_type: srmd
dataroot_H: testsets/set5
dataroot_L: None
phase: test
scale: 4
n_channels: 3
]
]
netG:[
net_type: srmd
in_nc: 19
out_nc: 3
nc: 128
nb: 12
gc: 32
ng: 2
reduction: 16
act_mode: R
upsample_mode: pixelshuffle
downsample_mode: strideconv
init_type: orthogonal
init_bn_type: uniform
init_gain: 0.2
scale: 4
]
train:[
G_lossfn_type: l1
G_lossfn_weight: 1.0
G_optimizer_type: adam
G_optimizer_lr: 0.0001
G_optimizer_clipgrad: None
G_scheduler_type: MultiStepLR
G_scheduler_milestones: [200000, 400000, 600000, 800000, 1000000, 2000000]
G_scheduler_gamma: 0.5
G_regularizer_orthstep: None
G_regularizer_clipstep: None
checkpoint_test: 5000
checkpoint_save: 5000
checkpoint_print: 200
]
opt_path: options/train_srmd.json
is_train: True
21-04-02 13:43:07.746 : loading PCA projection matrix...
21-04-02 13:43:07.746 : Random seed: 2330
21-04-02 13:46:37.295 : task: srmd
model: plain
gpu_ids: [0]
scale: 4
n_channels: 3
sigma: [0, 50]
sigma_test: 0
merge_bn: False
merge_bn_startpoint: 400000
path:[
root: superresolution
pretrained_netG: None
task: superresolution/srmd
log: superresolution/srmd
options: superresolution/srmd/options
models: superresolution/srmd/models
images: superresolution/srmd/images
]
datasets:[
train:[
name: train_dataset
dataset_type: srmd
dataroot_H: trainsets/trainH
dataroot_L: None
H_size: 96
dataloader_shuffle: True
dataloader_num_workers: 8
dataloader_batch_size: 64
phase: train
scale: 4
n_channels: 3
]
test:[
name: test_dataset
dataset_type: srmd
dataroot_H: testsets/set5
dataroot_L: None
phase: test
scale: 4
n_channels: 3
]
]
netG:[
net_type: srmd
in_nc: 19
out_nc: 3
nc: 128
nb: 12
gc: 32
ng: 2
reduction: 16
act_mode: R
upsample_mode: pixelshuffle
downsample_mode: strideconv
init_type: orthogonal
init_bn_type: uniform
init_gain: 0.2
scale: 4
]
train:[
G_lossfn_type: l1
G_lossfn_weight: 1.0
G_optimizer_type: adam
G_optimizer_lr: 0.0001
G_optimizer_clipgrad: None
G_scheduler_type: MultiStepLR
G_scheduler_milestones: [200000, 400000, 600000, 800000, 1000000, 2000000]
G_scheduler_gamma: 0.5
G_regularizer_orthstep: None
G_regularizer_clipstep: None
checkpoint_test: 5000
checkpoint_save: 5000
checkpoint_print: 200
]
opt_path: options/train_srmd.json
is_train: True
21-04-02 13:46:37.296 : loading PCA projection matrix...
21-04-02 13:46:37.296 : Random seed: 2052
21-04-02 13:47:16.681 : task: srmd
model: plain
gpu_ids: [0]
scale: 4
n_channels: 3
sigma: [0, 50]
sigma_test: 0
merge_bn: False
merge_bn_startpoint: 400000
path:[
root: superresolution
pretrained_netG: None
task: superresolution/srmd
log: superresolution/srmd
options: superresolution/srmd/options
models: superresolution/srmd/models
images: superresolution/srmd/images
]
datasets:[
train:[
name: train_dataset
dataset_type: srmd
dataroot_H: trainsets/trainH
dataroot_L: None
H_size: 96
dataloader_shuffle: True
dataloader_num_workers: 8
dataloader_batch_size: 64
phase: train
scale: 4
n_channels: 3
]
test:[
name: test_dataset
dataset_type: srmd
dataroot_H: testsets/set5
dataroot_L: None
phase: test
scale: 4
n_channels: 3
]
]
netG:[
net_type: srmd
in_nc: 19
out_nc: 3
nc: 128
nb: 12
gc: 32
ng: 2
reduction: 16
act_mode: R
upsample_mode: pixelshuffle
downsample_mode: strideconv
init_type: orthogonal
init_bn_type: uniform
init_gain: 0.2
scale: 4
]
train:[
G_lossfn_type: l1
G_lossfn_weight: 1.0
G_optimizer_type: adam
G_optimizer_lr: 0.0001
G_optimizer_clipgrad: None
G_scheduler_type: MultiStepLR
G_scheduler_milestones: [200000, 400000, 600000, 800000, 1000000, 2000000]
G_scheduler_gamma: 0.5
G_regularizer_orthstep: None
G_regularizer_clipstep: None
checkpoint_test: 5000
checkpoint_save: 5000
checkpoint_print: 200
]
opt_path: options/train_srmd.json
is_train: True
21-04-02 13:47:16.682 : loading PCA projection matrix...
21-04-02 13:47:16.682 : Random seed: 1695
21-04-02 13:59:06.591 : task: srmd
model: plain
gpu_ids: [0]
scale: 4
n_channels: 3
sigma: [0, 50]
sigma_test: 0
merge_bn: False
merge_bn_startpoint: 400000
path:[
root: superresolution
pretrained_netG: None
task: superresolution/srmd
log: superresolution/srmd
options: superresolution/srmd/options
models: superresolution/srmd/models
images: superresolution/srmd/images
]
datasets:[
train:[
name: train_dataset
dataset_type: srmd
dataroot_H: trainsets/trainH
dataroot_L: None
H_size: 96
dataloader_shuffle: True
dataloader_num_workers: 8
dataloader_batch_size: 64
phase: train
scale: 4
n_channels: 3
]
test:[
name: test_dataset
dataset_type: srmd
dataroot_H: testsets/set5
dataroot_L: None
phase: test
scale: 4
n_channels: 3
]
]
netG:[
net_type: srmd
in_nc: 19
out_nc: 3
nc: 128
nb: 12
gc: 32
ng: 2
reduction: 16
act_mode: R
upsample_mode: pixelshuffle
downsample_mode: strideconv
init_type: orthogonal
init_bn_type: uniform
init_gain: 0.2
scale: 4
]
train:[
G_lossfn_type: l1
G_lossfn_weight: 1.0
G_optimizer_type: adam
G_optimizer_lr: 0.0001
G_optimizer_clipgrad: None
G_scheduler_type: MultiStepLR
G_scheduler_milestones: [200000, 400000, 600000, 800000, 1000000, 2000000]
G_scheduler_gamma: 0.5
G_regularizer_orthstep: None
G_regularizer_clipstep: None
checkpoint_test: 5000
checkpoint_save: 5000
checkpoint_print: 200
]
opt_path: options/train_srmd.json
is_train: True
21-04-02 13:59:06.591 : loading PCA projection matrix...
21-04-02 13:59:06.591 : Random seed: 7798
21-04-02 14:09:39.545 : task: srmd
model: plain
gpu_ids: [0]
scale: 4
n_channels: 3
sigma: [0, 50]
sigma_test: 0
merge_bn: False
merge_bn_startpoint: 400000
path:[
root: superresolution
pretrained_netG: None
task: superresolution/srmd
log: superresolution/srmd
options: superresolution/srmd/options
models: superresolution/srmd/models
images: superresolution/srmd/images
]
datasets:[
train:[
name: train_dataset
dataset_type: srmd
dataroot_H: trainsets/trainH
dataroot_L: None
H_size: 96
dataloader_shuffle: True
dataloader_num_workers: 8
dataloader_batch_size: 64
phase: train
scale: 4
n_channels: 3
]
test:[
name: test_dataset
dataset_type: srmd
dataroot_H: testsets/set5
dataroot_L: None
phase: test
scale: 4
n_channels: 3
]
]
netG:[
net_type: srmd
in_nc: 19
out_nc: 3
nc: 128
nb: 12
gc: 32
ng: 2
reduction: 16
act_mode: R
upsample_mode: pixelshuffle
downsample_mode: strideconv
init_type: orthogonal
init_bn_type: uniform
init_gain: 0.2
scale: 4
]
train:[
G_lossfn_type: l1
G_lossfn_weight: 1.0
G_optimizer_type: adam
G_optimizer_lr: 0.0001
G_optimizer_clipgrad: None
G_scheduler_type: MultiStepLR
G_scheduler_milestones: [200000, 400000, 600000, 800000, 1000000, 2000000]
G_scheduler_gamma: 0.5
G_regularizer_orthstep: None
G_regularizer_clipstep: None
checkpoint_test: 5000
checkpoint_save: 5000
checkpoint_print: 200
]
opt_path: options/train_srmd.json
is_train: True
21-04-02 14:09:39.546 : loading PCA projection matrix...
21-04-02 14:09:39.546 : Random seed: 896
21-04-02 14:11:17.175 : task: srmd
model: plain
gpu_ids: [0]
scale: 4
n_channels: 3
sigma: [0, 50]
sigma_test: 0
merge_bn: False
merge_bn_startpoint: 400000
path:[
root: superresolution
pretrained_netG: None
task: superresolution/srmd
log: superresolution/srmd
options: superresolution/srmd/options
models: superresolution/srmd/models
images: superresolution/srmd/images
]
datasets:[
train:[
name: train_dataset
dataset_type: srmd
dataroot_H: trainsets/trainH
dataroot_L: None
H_size: 96
dataloader_shuffle: True
dataloader_num_workers: 8
dataloader_batch_size: 64
phase: train
scale: 4
n_channels: 3
]
test:[
name: test_dataset
dataset_type: srmd
dataroot_H: testsets/set5
dataroot_L: None
phase: test
scale: 4
n_channels: 3
]
]
netG:[
net_type: srmd
in_nc: 19
out_nc: 3
nc: 128
nb: 12
gc: 32
ng: 2
reduction: 16
act_mode: R
upsample_mode: pixelshuffle
downsample_mode: strideconv
init_type: orthogonal
init_bn_type: uniform
init_gain: 0.2
scale: 4
]
train:[
G_lossfn_type: l1
G_lossfn_weight: 1.0
G_optimizer_type: adam
G_optimizer_lr: 0.0001
G_optimizer_clipgrad: None
G_scheduler_type: MultiStepLR
G_scheduler_milestones: [200000, 400000, 600000, 800000, 1000000, 2000000]
G_scheduler_gamma: 0.5
G_regularizer_orthstep: None
G_regularizer_clipstep: None
checkpoint_test: 5000
checkpoint_save: 5000
checkpoint_print: 200
]
opt_path: options/train_srmd.json
is_train: True
21-04-02 14:11:17.175 : loading PCA projection matrix...
21-04-02 14:11:17.175 : Random seed: 3606
21-04-02 14:11:17.294 : Number of train images: 70, iters: 2
21-04-02 14:11:20.422 :
Networks name: SRMD
Params number: 1553200
Net structure:
SRMD(
(model): Sequential(
(0): Conv2d(19, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): ReLU(inplace=True)
(6): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): ReLU(inplace=True)
(8): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): ReLU(inplace=True)
(10): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(inplace=True)
(14): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(inplace=True)
(16): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(17): ReLU(inplace=True)
(18): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(19): ReLU(inplace=True)
(20): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(21): ReLU(inplace=True)
(22): Conv2d(128, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(23): PixelShuffle(upscale_factor=4)
)
)
21-04-02 14:11:20.429 :
| mean | min | max | std || shape
| 0.000 | -0.057 | 0.059 | 0.015 | torch.Size([128, 19, 3, 3]) || model.0.weight
| 0.000 | 0.000 | 0.000 | 0.000 | torch.Size([128]) || model.0.bias
| 0.000 | -0.027 | 0.026 | 0.006 | torch.Size([128, 128, 3, 3]) || model.2.weight
| 0.000 | 0.000 | 0.000 | 0.000 | torch.Size([128]) || model.2.bias
| 0.000 | -0.025 | 0.027 | 0.006 | torch.Size([128, 128, 3, 3]) || model.4.weight
| 0.000 | 0.000 | 0.000 | 0.000 | torch.Size([128]) || model.4.bias
| 0.000 | -0.025 | 0.027 | 0.006 | torch.Size([128, 128, 3, 3]) || model.6.weight
| 0.000 | 0.000 | 0.000 | 0.000 | torch.Size([128]) || model.6.bias
| -0.000 | -0.028 | 0.025 | 0.006 | torch.Size([128, 128, 3, 3]) || model.8.weight
| 0.000 | 0.000 | 0.000 | 0.000 | torch.Size([128]) || model.8.bias
| 0.000 | -0.026 | 0.026 | 0.006 | torch.Size([128, 128, 3, 3]) || model.10.weight
| 0.000 | 0.000 | 0.000 | 0.000 | torch.Size([128]) || model.10.bias
| 0.000 | -0.025 | 0.028 | 0.006 | torch.Size([128, 128, 3, 3]) || model.12.weight
| 0.000 | 0.000 | 0.000 | 0.000 | torch.Size([128]) || model.12.bias
| -0.000 | -0.027 | 0.025 | 0.006 | torch.Size([128, 128, 3, 3]) || model.14.weight
| 0.000 | 0.000 | 0.000 | 0.000 | torch.Size([128]) || model.14.bias
| -0.000 | -0.024 | 0.028 | 0.006 | torch.Size([128, 128, 3, 3]) || model.16.weight
| 0.000 | 0.000 | 0.000 | 0.000 | torch.Size([128]) || model.16.bias
| 0.000 | -0.025 | 0.025 | 0.006 | torch.Size([128, 128, 3, 3]) || model.18.weight
| 0.000 | 0.000 | 0.000 | 0.000 | torch.Size([128]) || model.18.bias
| 0.000 | -0.025 | 0.027 | 0.006 | torch.Size([128, 128, 3, 3]) || model.20.weight
| 0.000 | 0.000 | 0.000 | 0.000 | torch.Size([128]) || model.20.bias
| 0.000 | -0.022 | 0.026 | 0.006 | torch.Size([48, 128, 3, 3]) || model.22.weight
| 0.000 | 0.000 | 0.000 | 0.000 | torch.Size([48]) || model.22.bias
/home/zpl/local/venv/py3_torch/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
/home/zpl/local/venv/py3_torch/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:156: UserWarning: The epoch parameter in `scheduler.step()` was not necessary and is being deprecated where possible. Please use `scheduler.step()` to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available. Please open an issue if you are unable to replicate your use case: https://github.com/pytorch/pytorch/issues/new/choose.
warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning)
Is it because the computer is slow????
请问你解决这个问题了吗?我也是同样的情况,没改任何训练参数。 我发现时间主要是在每个epoch开始的时候的data loader,CPU占用会很高,然后过很久才能load完,不知道该怎么解决。
请问你解决这个问题了吗?我也是同样的情况,没改任何训练参数。 我发现时间主要是在每个epoch开始的时候的data loader,CPU占用会很高,然后过很久才能load完,不知道该怎么解决。
同问,我也是!!!
请把numworks调为0
numworks调为0对GPU使用率低并没有用!
Please refer to https://github.com/JingyunLiang/SwinIR/issues/57#issuecomment-1109407381 and cut the large pictures in the training set into small pictures
My train log is:
非常抱歉,说英文比较费事,下面我会用中文描述问题。 我做了如下计算:开始训练是09:20.57,训练出来一个结果是第二天早上06:49.26,总共训练22小时27分,epoch次数是90.平均训练一次花费时间15分钟。程序设定训练1000000次,花费总时间为10,393.5天,即28.5年。 这样的训练速度也太慢了吧。有什么提升速度的方法吗? 而且GPU的占用率一直非常低,不知道是什么原因,请问有没有解决方案。