Open azcdk opened 4 months ago
I also have the same question, can the author answer it?
Thank you for your interest in our work. In subsequent experiments, we found that MoFME can outperform MoE without the need for these two loss functions. To minimize computational load and enhance model efficiency, we have removed these loss functions from the code. We'll also make further exploration to find the optimal experimental setting to boost the effectiveness of the load balance loss and uncertainty loss for the best model performance in the future.
Could the author please indicate which part of the code reflects Uncertainty-aware?
It seems that load balance loss and uncertainty loss are not being used in the MoFME. It does have an l_aux, maybe an auxiliary loss, but it is initialized with 'None'.