TzuchengChang / NASS

Noise-Aware Speech Separation with Contrastive Learning
14 stars 6 forks source link

Adding some value on the diagonal #1

Closed MordehayM closed 6 months ago

MordehayM commented 6 months ago

Hi, Can you please explain why you add value=-10 on the diagonal to the cosine similarity matrix between the query and the noise embeddings?

https://github.com/TzuchengChang/NASS/blob/ab98c434a5b51ff0ebfd7d3a97a9876135fa52dd/speechbrain/speechbrain/nnet/patchnce.py#L34

Thanks

TzuchengChang commented 6 months ago

Hi, Can you please explain why you add value=-10 on the diagonal to the cosine similarity matrix between the query and the noise embeddings?

https://github.com/TzuchengChang/NASS/blob/ab98c434a5b51ff0ebfd7d3a97a9876135fa52dd/speechbrain/speechbrain/nnet/patchnce.py#L34

Thanks

The elements on the diagonal represent the similarity between identical features, making them redundant and insignificant. It is sufficient to fill the diagonal with a negligible small value, specifically exp(-10), which is virtually equivalent to zero. Alternatively, you can also set other even smaller values.

MordehayM commented 6 months ago

Why do you say identical features? The similarity is between the query and the noise, so on the diagonal it's not identical representations.

TzuchengChang commented 6 months ago

Why do you say identical features? The similarity is between the query and the noise, so on the diagonal it's not identical representations.

It's true that they are not exactly the same, but they may be very similar in the early stages of training. Our idea is to compare positive examples with patches in the same position, and negative examples with patches in different positions, which may yield better results. The code didn't consider the same position for negative examples, so it set it to zero. You can try to see if it's useful to not set it to zero :)