YangSun22 / TC-MoA

Task-Customized Mixture of Adapters for General Image Fusion (CVPR 2024)
63 stars 5 forks source link

some question about expert #6

Closed sove45 closed 4 months ago

sove45 commented 5 months ago

excuse me , I noticed the code in MMOE.py in line 310 expert_outputs = [self.expertsi for i in range(self.num_experts)] this means we will compute 4 experts even if the expert_inputs contains zero-token matrix? (Theoretically, it is possible for us to get a 0-token matrix in expert_inputs)

sove45 commented 5 months ago

fortunately,we didn't met the 0-token matrix during the train phase

YangSun22 commented 5 months ago

This part of the code is inherited from this MMoE work and is fine for practical use and in theory. It may take a few more readings and then running test cases outputting intermediate results to understand this code

sove45 commented 4 months ago

This part of the code is inherited from this MMoE work and is fine for practical use and in theory. It may take a few more readings and then running test cases outputting intermediate results to understand this code

"I'm sorry, this is my mistake. I noticed that the appearance of the NaN matrix was due to gradient explosion during the training, not because of differentiation leading to a None matrix."