ajhamdi / MVTN

pytorch implementation of the ICCV'21 paper "MVTN: Multi-View Transformation Network for 3D Shape Recognition"
100 stars 9 forks source link

Worse accuracy while continuing training due to a possible mistake in initializing `setup` #9

Open Kumoi0728 opened 2 years ago

Kumoi0728 commented 2 years ago

I trained a MVTN model with 100 epochs with the following command, and stopped training after 57 epochs.

python run_mvtn.py --data_dir data/ModelNet40/ --run_mode train --mvnetwork mvcnn --epochs 100 --nb_views 1 --views_config learned_circular

And the output of the 57th epoch is like this,

Epoch: [57/100] Iter [50/492] Loss: 0.7633 Iter [100/492] Loss: 0.7892 Iter [150/492] Loss: 0.3939 Iter [200/492] Loss: 0.1820 Iter [250/492] Loss: 0.2282 Iter [300/492] Loss: 0.6939 Iter [350/492] Loss: 0.4468 Iter [400/492] Loss: 0.2383 Iter [450/492] Loss: 0.5454 Evaluation: train acc: 82.03 - train Loss: 0.6457 Val Acc: 71.31 - val Loss: 1.0960 Current best val acc: 72.61

When I load the trained model to continue training, although it started training from the 58th epoch correctly, the accuracies got lower,

Epoch: [58/100] Iter [50/492] Loss: 1.2060 Iter [100/492] Loss: 0.6699 Iter [150/492] Loss: 0.5014 Iter [200/492] Loss: 0.4189 Iter [250/492] Loss: 0.2721 Iter [300/492] Loss: 0.3099 Iter [350/492] Loss: 1.0518 Iter [400/492] Loss: 1.1512 Iter [450/492] Loss: 0.2506 Evaluation: train acc: 55.48 - train Loss: 1.6519 Val Acc: 60.13 - val Loss: 1.4470 Current best val acc: 72.61

I found that in ops.py line 260-264, only when is_learning_views = True , the trained MVTN model will be loaded,

if setup["is_learning_views"]:
        models_bag["mvtn"].load_state_dict(
            checkpoint['mvtn'])
        models_bag["mvtn_optimizer"].load_state_dict(
            checkpoint['mvtn_optimizer'])

and in line 55-56, is_learning_views in setup is initialized like this,

setup["is_learning_views"] = setup["views_config"] in ["learned_offset",
                                                       "learned_direct", "learned_spherical", "learned_random", "learned_transfer"]

should the learned_offset in line 55 be repalced by learned_circular? Becaues the choices of learned views_config must be learned_circular, learned_spherical, learned_direct, learned_random or learned_transfer.

I am sorry if the reason is not here. I would appreciate it if you could tell me the correct way. :) @ajhamdi

Kumoi0728 commented 2 years ago

I checked the results of the continued training. With epoch=57 as the cut-off point, the position of camera also changed a lot. When epoch=57, camera 0 was like: MV_cameras_57 However, when epoch=60, camera 0 was like: MV_cameras_60

According to other epochs, the position of the camera should not have changed so much. I think this is due to the fact that the trained MVTN model was not loaded correctly when the training continued.

ajhamdi commented 2 years ago

yes @Kumoi0728 you are right . I think this is a bug in the cod. I will look into it

auniquesun commented 1 year ago

I trained a MVTN model with 100 epochs with the following command, and stopped training after 57 epochs.

python run_mvtn.py --data_dir data/ModelNet40/ --run_mode train --mvnetwork mvcnn --epochs 100 --nb_views 1 --views_config learned_circular

And the output of the 57th epoch is like this,

Epoch: [57/100] Iter [50/492] Loss: 0.7633 Iter [100/492] Loss: 0.7892 Iter [150/492] Loss: 0.3939 Iter [200/492] Loss: 0.1820 Iter [250/492] Loss: 0.2282 Iter [300/492] Loss: 0.6939 Iter [350/492] Loss: 0.4468 Iter [400/492] Loss: 0.2383 Iter [450/492] Loss: 0.5454 Evaluation: train acc: 82.03 - train Loss: 0.6457 Val Acc: 71.31 - val Loss: 1.0960 Current best val acc: 72.61

When I load the trained model to continue training, although it started training from the 58th epoch correctly, the accuracies got lower,

Epoch: [58/100] Iter [50/492] Loss: 1.2060 Iter [100/492] Loss: 0.6699 Iter [150/492] Loss: 0.5014 Iter [200/492] Loss: 0.4189 Iter [250/492] Loss: 0.2721 Iter [300/492] Loss: 0.3099 Iter [350/492] Loss: 1.0518 Iter [400/492] Loss: 1.1512 Iter [450/492] Loss: 0.2506 Evaluation: train acc: 55.48 - train Loss: 1.6519 Val Acc: 60.13 - val Loss: 1.4470 Current best val acc: 72.61

I found that in ops.py line 260-264, only when is_learning_views = True , the trained MVTN model will be loaded,

if setup["is_learning_views"]:
        models_bag["mvtn"].load_state_dict(
            checkpoint['mvtn'])
        models_bag["mvtn_optimizer"].load_state_dict(
            checkpoint['mvtn_optimizer'])

and in line 55-56, is_learning_views in setup is initialized like this,

setup["is_learning_views"] = setup["views_config"] in ["learned_offset",
                                                       "learned_direct", "learned_spherical", "learned_random", "learned_transfer"]

should the learned_offset in line 55 be repalced by learned_circular? Becaues the choices of learned views_config must be learned_circular, learned_spherical, learned_direct, learned_random or learned_transfer.

I am sorry if the reason is not here. I would appreciate it if you could tell me the correct way. :) @ajhamdi

Recently, I have experimented with the code in this repo. I agree with you that in line 55 in ops.py, learn_offset should be replaced by learned_circular.

I found even though training from scratch, the resutls are not unsatisfactory, shown in the following figure. 1668602288742

In my case, I set views_config=learned_spherical and test on ScanObjectNN. According to the code, the model will adjust the scene parameters for choosing a better position to render the point clouds into images, then classify the images using MVCNN. However, after 21 epochs, I only get 18.7% accuracy. I think the score is too low and the process is abnormal.

I have read the code and understood their working process. There is no bug during training and evaluation, but I am not sure whether I used the proper settings. The running command is shown in the following figure. 1668602698259

Do you have any insight or advice on the poor performance? Thanks. @ajhamdi @Kumoi0728