Open Lich-King000 opened 2 years ago
Thanks for your attention.
self.pos_prob
aims at controlling the probability of positive samples. And SPM mainly designed for eliminating the influence of extremely poor samples during inference.
Hello Dear Cui : @yutaocui
The training of the SPM (Semantic Pattern Memory) module is performed separately after the completion of Mixformer training, during which the backbone model (MAM) has already learned representations of the similarity between search tokens and target tokens. So why does SPM work effectively? I think it's because the bias of ">0.5" aligns perfectly with the bias used in the attention mechanism of MAM (where larger dot products indicate similarity). Is it right?
To filter out templates with higher similarity, you could directly calculate the similarity and apply a hard threshold of 0.5 for filtering. However, you didn't opt for this approach. Instead, you chose to design SPM, which closely resembles the structure of MAM, and use the similarity calculated by SPM for filtering. This is equivalent to a soft threshold.
I have a question about the training process in the SPM module. The selection of positive and negative training samples appears to be close to random. This training approach seems a bit hard to understand. Could you explain why you chose to do it this way?
I would appreciate it if you could solve my questions.
Using meaningless supervision signals can lead to the Score_branch failing to converge (being completely useless).
@yutaocui Thank you for your excellent work!
I have noticed that it seems the labels used for training SPM module are generated randomly by fixed pos_prob.
https://github.com/MCG-NJU/MixFormer/blob/90a6a9c9a9c874f56904796bab1ddf158948d4e3/lib/train/data/sampler.py#L216-L231
But SPM module aims at selecting high quality templates.
I am confused that how could this randomly labels play a role in selecting templates.
I am appreciate it if you could solve my questions.