clinplayer / Point2Skeleton

Point2Skeleton: Learning Skeletal Representations from Point Clouds (CVPR2021)
MIT License
211 stars 38 forks source link

About the Reproducibility #2

Closed FishWoWater closed 3 years ago

FishWoWater commented 3 years ago

Thank you for your nice work!

I have some concern about the reproducibility I train with the provided config and compare the results on the test-set against your pretrained model (shown as belows, the left is the model I trained and the right is the model provided by you. (I only use the first stage skel model) Also, I note that the training strategies in your two versions on arxiv differ from each other. I uncomment the code of joint training in train.py and it comes to NaN easily. Can you provide some idea about this? Thanks in advance 50

clinplayer commented 3 years ago

Hi, thanks for the interest!

I think the two results look largely similar. So what's the problem here with the reproducibility?

We also found the joint training stage can easily lead to NaN values. I think the problem is caused by the gradient exploding since the network goes very deep when two networks are optimized together. If you do want to train two models jointly, a potential solution could be adding more skip connections with larger spans in the GCN.

Actually, we find the joint training does not affect the final results too much, while it just slightly helps to smooth the mesh. Therefore, in the latest version, we simplify the training and remove the joint stage.

FishWoWater commented 3 years ago

Hi, thanks for the interest!

I think the two results look largely similar. So what's the problem here with the reproducibility?

We also found the joint training stage can easily lead to NaN values. I think the problem is caused by the gradient exploding since the network goes very deep when two networks are optimized together. If you do want to train two models jointly, a potential solution could be adding more skip connections with larger spans in the GCN.

Actually, we find the joint training does not affect the final results too much, while it just slightly helps to smooth the mesh. Therefore, in the latest version, we simplify the training and remove the joint stage.

Thanks for your quick reply! I think the problem is that for the meshes in the right group, the radius of some centers are extremely large

clinplayer commented 3 years ago

Hi, thanks for the interest! I think the two results look largely similar. So what's the problem here with the reproducibility? We also found the joint training stage can easily lead to NaN values. I think the problem is caused by the gradient exploding since the network goes very deep when two networks are optimized together. If you do want to train two models jointly, a potential solution could be adding more skip connections with larger spans in the GCN. Actually, we find the joint training does not affect the final results too much, while it just slightly helps to smooth the mesh. Therefore, in the latest version, we simplify the training and remove the joint stage.

Thanks for your quick reply! I think the problem is that for the meshes in the right group, the radius of some centers are extremely large

Since we do not have any ground truth to constrain the radii, the radii can be very incorrectly large, especially for the skeletal points around the shape boundaries. I think this is because the contextual information around boundaries is not very stable, so the network may incorrectly correlate a skeletal point with some surface points that are not in this local part.

We did not carefully tune the hyperparameters... The left ones are your results? But it seems your results are better than mine and the problems are alleviated! XD

FishWoWater commented 3 years ago

@clinplayer HXD, the right ones are our results... I further locate the problem and find the problem is that the radii become incorrectly larger after the radius refinement function. I speculate resampled points from the reconstructed surface may have some problem

clinplayer commented 3 years ago

@clinplayer HXD, the right ones are our results... I further locate the problem and find the problem is that the radii become incorrectly larger after the radius refinement function. I speculate resampled points from the reconstructed surface may have some problem

What if you do not use radius refinement? Do the radii still become that large? If so, the problem may lie in the radius refinement. I will try to see if I can fix it.

FishWoWater commented 3 years ago

@clinplayer HXD, the right ones are our results... I further locate the problem and find the problem is that the radii become incorrectly larger after the radius refinement function. I speculate resampled points from the reconstructed surface may have some problem

What if you do not use radius refinement? Do the radii still become that large? If so, the problem may lie in the radius refinement. I will try to see if I can fix it.

If I do not use radius refinement, the problem does not exist any more; Actually, radius refinement works well with your pretrained model but not Mine :(

clinplayer commented 3 years ago

In this case, I suppose the problem is still caused by the incorrect correlation between the skeletal points and surface points. I would suggest giving more epochs for the skeletal prediction network to converge, e.g. set PRE_TRAIN_EPOCH=30, SKELPOINT_TRAIN_EPOCH = 40. If the issue still occurs, please send me your network weights and the unsatisfactory shapes.

FishWoWater commented 3 years ago

@clinplayer OK, I will try. Thanks for your help!