Roestlab / massformer

Tandem Mass Spectrum Prediction with Graph Transformers
BSD 2-Clause "Simplified" License
72 stars 25 forks source link

Transform signal #2

Closed Qiong-Yang closed 3 months ago

Qiong-Yang commented 1 year ago

May I ask why we need to perform log10over3 scaling on the spectrum signal?

adamoyoung commented 1 year ago

Hi Qiong-Yang,

Thanks for your interest. The precise mathematical definition of the transformation that we used is $y=\log_{10}(x+1)/3$. Empirically, we found that using this preprocessing on the training spectra improves our model's performance on the test spectra, so that's the main reason we include it.

Intuitively, the log transformation reduces the importance of very large peaks, and increases the importance of very small peaks. This might make the model better at predicting low-intensity peaks that would otherwise not contribute very much to the loss. It is not uncommon for mass spectra to have low entropy (in other words, contain one or two high intensity peaks), and in these cases it is very easy for the model to get a good score. Log transformation might improve the model's ability to accurately predict the smaller peaks in such spectra by increasing the penalty for missing them.

We were inspired by Zhu et al to apply this transformation, I recommended giving their paper a read if you haven't already.