the training process stopped after

guess-who-ami commented 2 years ago

我不知道为什么训练过程如此短暂，而且epoch的进度比训练进度还快。即使加入了合成数据集，训练过程也没变化 epoch进程到了100%，但是训练过程还没结束，然后就卡住了

guess-who-ami commented 2 years ago

2022-01-07 13-32-59 的屏幕截图

guess-who-ami commented 2 years ago

after python3 -m train.train_linemod_pvn3d --cls ape the output is : cls_type: ape cls_id in lm_dataset.py 1 Train without rendered data from https://github.com/ethnhe/raster_triangle Train without fuse data from https://github.com/ethnhe/raster_triangle train_dataset_size: 186 real+ 0 render+ 0 fuse train_dataset_size: 186 cls_id in lm_dataset.py 1 val_dataset_size: 1050 loading pretrained mdl. /home/yuenlin/.venvs/pvn3d/lib/python3.6/site-packages/torch/nn/_reduction.py:43: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead. warnings.warn(warning.format(ret)) {'bn_decay': 0.5, 'bn_momentum': 0.9, 'cal_metrics': False, 'checkpoint': None, 'cls': 'ape', 'decay_step': 200000.0, 'epochs': 1000, 'eval_net': False, 'lr': 0.01, 'lr_decay': 0.5, 'run_name': 'sem_seg_run_1', 'test': False, 'test_occ': False, 'weight_decay': 0} epochs: 0%| | 0/25 [00:00<?, ?it/s] /home/yuenlin/.venvs/pvn3d/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:123: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning) /home/yuenlin/.venvs/pvn3d/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:143: UserWarning: The epoch parameter in scheduler.step() was not necessary and is being deprecated where possible. Please use scheduler.step() to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available. Please open an issue if you are unable to replicate your use case: https://github.com/pytorch/pytorch/issues/new/choose. warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning) /home/yuenlin/.venvs/pvn3d/lib/python3.6/site-packages/torch/nn/functional.py:2796: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead. warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.") /home/yuenlin/.venvs/pvn3d/lib/python3.6/site-packages/torch/nn/functional.py:2973: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. "See the documentation of nn.Upsample for details.".format(mode)) /home/yuenlin/.venvs/pvn3d/lib/python3.6/site-packages/torch/nn/modules/container.py:100: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument. input = module(input) /home/yuenlin/documents/PVN3D-pytorch-1.5/pvn3d/lib/loss.py:29: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument. logpt = F.log_softmax(input) epochs: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [46:09<00:00, 110.79s/it] train: 93%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 4650/5000 [46:09<03:27, 1.69it/s, total_it=4650]

LEONHWH commented 2 years ago

Train without rendered data from https://github.com/ethnhe/raster_triangle Train without fuse data from https://github.com/ethnhe/raster_triangle

注意到你没有制作渲染数据和融合数据去训练

weidu3 commented 1 year ago

我不知道为什么训练过程如此短暂，而且epoch的进度比训练进度还快。即使加入了合成数据集，训练过程也没变化 epoch进程到了100%，但是训练过程还没结束，然后就卡住了

我最近遇到了和您一样的问题，请问您解决这个问题了吗？我是没有使用合成数据的。

ethnhe / PVN3D

the training process stopped after #101