Thanks for releasing the code! I tried only your div loss and norm loss in my own dataset, but it seems the losses doesn't behave as expected. So if I may ask for help, the questions are:
I use $\alpha L{div} + \beta L{norm}$, while $\alpha$ and $\beta$ are set to be 0.2 as in your paper. During the training process, $L{norm}$ increased a bit from 0 and then stopped changing at around 0.04 or 0.05, to make branches to be different; at the same time, $L{div}$ decreased at the beginning from 0.36 and stopped at 0.05. Are the losses reasonable according to your experience? If not, could you think of any mistake that I could make to cause it?
As to the $L_{mil}$, I am a little bit confusing, did you use the mean of all branches to serve as the final T-CAM for loss? Or you calculate the classification loss in each branch?
Thanks for reading. Looking forward to your response!
Hi,
Thanks for releasing the code! I tried only your div loss and norm loss in my own dataset, but it seems the losses doesn't behave as expected. So if I may ask for help, the questions are:
Thanks for reading. Looking forward to your response!
June