Cyanogenoid / dspn

[NeurIPS 2019] Deep Set Prediction Networks
https://arxiv.org/abs/1906.06565
MIT License
100 stars 18 forks source link

Sparse output for MLPDecoder (for MNIST-set experiments)? #4

Closed sangwoomo closed 4 years ago

sangwoomo commented 4 years ago

Hi, thank you for fast response! I tested MNIST-set experiments on MLPDecoder and found that the output cardinality tends to go small when using the Chamfer loss. While it is understandable that Chamfer loss only needs to keep "a few" points to minimize their loss (several target points map to the shared output point), I just want to check if this phenomena also happened to you (to check that it is not my problem, e.g., of environments). Also, is it common to use an additional regularizer to enforce output and target to have similar number of points (i.e., || # of pred masks = # of target masks ||)? I'm pretty new in this domain, hence thank you for your kind help! :)

Cyanogenoid commented 4 years ago

MLPDecoder producing too few points is normal and happens for me too, especially when the dimensions for the MLP are somewhat small. With the hyperparams in scripts/mnist.sh, do you get results that look similar to Appendix C?

I haven't seen any papers that have such a regularisation term, but I don't think that there is any particular reason against it.

Cyanogenoid commented 4 years ago

Oh, I just remembered: You have to use the testing git branch, not master. The testing branch biases the masks towards 1 so that it gets less stuck at the local optimum of predicting too few points. Let me know if that fixes it.

sangwoomo commented 4 years ago

Hi, sorry for the late response! Adding 1 to the bias seems to work okay. In particular, I checked the (soft) number of masks by masks[-1].sum(-1).mean() and found that initializing bias to 0 (master branch) collapses to # of masks < 20 and adding 1 (testing branch) collapses to # of masks = 340 (almost max). I also checked Hungarian loss, then the number of masks indeed goes to 130, which is near to the correct number of points. It seems that learning the cardinality of sets (especially when using Chamfer loss) is extremely unstable! Do you have any related work on this issue? It seems to be an interesting problem to handle.

Cyanogenoid commented 4 years ago

With point cloud generation tasks of 3d objects, people usually sample a fixed number of points, e.g. 2048 points for a shape. That's why they can get away without using any masks (output size is always 2048 after all) so this issue with the Chamfer loss hasn't really come up in those contexts. https://arxiv.org/abs/2001.11845 has an output that, as far as I understand, does a classification over the set sizes, though they use the Hungarian loss anyway.

sangwoomo commented 4 years ago

Thank you for the kind response! As you said, the set size seems not so sensitive to the point clouds. Investigating proper applications of the variable-sized set (e.g., object detection as you suggested) looks like an interesting research question. Thanks a lot for your helpful comments! :)