Closed FishWoWater closed 3 years ago
Hi, thanks for the interest!
I think the two results look largely similar. So what's the problem here with the reproducibility?
We also found the joint training stage can easily lead to NaN values. I think the problem is caused by the gradient exploding since the network goes very deep when two networks are optimized together. If you do want to train two models jointly, a potential solution could be adding more skip connections with larger spans in the GCN.
Actually, we find the joint training does not affect the final results too much, while it just slightly helps to smooth the mesh. Therefore, in the latest version, we simplify the training and remove the joint stage.
Hi, thanks for the interest!
I think the two results look largely similar. So what's the problem here with the reproducibility?
We also found the joint training stage can easily lead to NaN values. I think the problem is caused by the gradient exploding since the network goes very deep when two networks are optimized together. If you do want to train two models jointly, a potential solution could be adding more skip connections with larger spans in the GCN.
Actually, we find the joint training does not affect the final results too much, while it just slightly helps to smooth the mesh. Therefore, in the latest version, we simplify the training and remove the joint stage.
Thanks for your quick reply! I think the problem is that for the meshes in the right group, the radius of some centers are extremely large
Hi, thanks for the interest! I think the two results look largely similar. So what's the problem here with the reproducibility? We also found the joint training stage can easily lead to NaN values. I think the problem is caused by the gradient exploding since the network goes very deep when two networks are optimized together. If you do want to train two models jointly, a potential solution could be adding more skip connections with larger spans in the GCN. Actually, we find the joint training does not affect the final results too much, while it just slightly helps to smooth the mesh. Therefore, in the latest version, we simplify the training and remove the joint stage.
Thanks for your quick reply! I think the problem is that for the meshes in the right group, the radius of some centers are extremely large
Since we do not have any ground truth to constrain the radii, the radii can be very incorrectly large, especially for the skeletal points around the shape boundaries. I think this is because the contextual information around boundaries is not very stable, so the network may incorrectly correlate a skeletal point with some surface points that are not in this local part.
We did not carefully tune the hyperparameters... The left ones are your results? But it seems your results are better than mine and the problems are alleviated! XD
@clinplayer HXD, the right ones are our results... I further locate the problem and find the problem is that the radii become incorrectly larger after the radius refinement function. I speculate resampled points from the reconstructed surface may have some problem
@clinplayer HXD, the right ones are our results... I further locate the problem and find the problem is that the radii become incorrectly larger after the radius refinement function. I speculate resampled points from the reconstructed surface may have some problem
What if you do not use radius refinement? Do the radii still become that large? If so, the problem may lie in the radius refinement. I will try to see if I can fix it.
@clinplayer HXD, the right ones are our results... I further locate the problem and find the problem is that the radii become incorrectly larger after the radius refinement function. I speculate resampled points from the reconstructed surface may have some problem
What if you do not use radius refinement? Do the radii still become that large? If so, the problem may lie in the radius refinement. I will try to see if I can fix it.
If I do not use radius refinement, the problem does not exist any more; Actually, radius refinement works well with your pretrained model but not Mine :(
In this case, I suppose the problem is still caused by the incorrect correlation between the skeletal points and surface points. I would suggest giving more epochs for the skeletal prediction network to converge, e.g. set PRE_TRAIN_EPOCH=30, SKELPOINT_TRAIN_EPOCH = 40. If the issue still occurs, please send me your network weights and the unsatisfactory shapes.
@clinplayer OK, I will try. Thanks for your help!
Thank you for your nice work!
I have some concern about the reproducibility I train with the provided config and compare the results on the test-set against your pretrained model (shown as belows, the left is the model I trained and the right is the model provided by you. (I only use the first stage skel model) Also, I note that the training strategies in your two versions on arxiv differ from each other. I uncomment the code of joint training in
train.py
and it comes toNaN
easily. Can you provide some idea about this? Thanks in advance