alternating optimization and joint optimization are not equal

chrischoy / fully-differentiable-deep-ndf-tf

Fully differentiable deep-neural decision forest in tensorflow

MIT License

228 stars 62 forks source link

alternating optimization and joint optimization are not equal #2

Closed gaopeng-eugene closed 6 years ago

gaopeng-eugene commented 6 years ago

In the ICCV paper, the author only proposed alternating optimisation trained with resilient SGD. Joint optimization is definitely not the right solution. If they author proposed joint optimization in the ICCV paper, the contribution is minor and will not be ICCV best paper.

chrischoy commented 6 years ago

Impeccable reasoning! Joint optimization is wrong because, otherwise, they wouldn't have won the award!

gaopeng-eugene commented 6 years ago

In the iccv paper supplement, they prove alternating optimization for pi is convex and guarantee converge to a unique optimal by several iterations. The info Gan is very similiar with random forest and decision tree. However, when you use joint optimization nothing exist. Unless you can give a prove. Please go to author’s homepage and read the provement.

gaopeng-eugene commented 6 years ago

Another reason I found this paper is interesting is dynamic routing in capsule net and follow-up paper EM optimization. It seems that this two papers have some link.

gaopeng-eugene commented 6 years ago

Another to prove joint optimization is wrong is when you visulize the distribution in the leaf. There is no patterns because it’s only a feature representation which do not contain any semantic meaning. However, when you train by EM. Same class will follow similiar routing path. The leaf distribution is more interpretable just because past experience show decision tree leaf is very easy to understand.

gaopeng-eugene commented 6 years ago

http://www.dsi.unive.it/~srotabul/files/publications/CVPR2014a_supp.pdf

chrischoy commented 6 years ago

Seems like you just read the analysis but yet understood what are the disadvantages of EM and why it requires convexity analysis.

First, EM algorithm, or alternating optimization in general, suffers from slow convergence. This is because the other variables that are not optimized slow down the convergence of the other variable.

Second, alternating optimization is not guaranteed to converge to the optimum. This is why more analysis is required for alternating optimization.

I explained the same thing in the readme. So please read before you make random comments that do not make much sense.

gaopeng-eugene commented 6 years ago

Thank you so much for your answer. Have you ever compared joint optimisation with alternating optimisation ? If joint optimisation is the right answer, why the author chose the much more complex and difficult to implement alternating optimisation?

gaopeng-eugene commented 6 years ago

The author respond me just now saying that joint optimization is OK. But they only report EM method because its more effective.

chrischoy commented 6 years ago

Okay, I'm not working on this project. Why don't you implement their method and compare? You just have to remove the softmax variables from the optimization and optimize them separately using the update rule they proposed. It wouldn't be that difficult and this way, you can actually contribute to the society. (:

gaopeng-eugene commented 6 years ago

Sure, I will test it on ROI region with joint optimisation. Again, big thanks for the implementation. Looking forward to see your interesting work. 2D-R2N2 is very good 😊

gaopeng-eugene commented 6 years ago

Do you know any paper proposing boost differentiable decision forests? Basically, the newly added tree will focus on misclassified sample ?