Noble-Lab / casanovo

De Novo Mass Spectrometry Peptide Sequencing with a Transformer Model
https://casanovo.readthedocs.io
Apache License 2.0
112 stars 40 forks source link

Exclude some PTMs for prediction? #308

Closed JannikSchneider12 closed 8 months ago

JannikSchneider12 commented 8 months ago

Hello everyone,

I wanted to use the model checkpoint to predict some of my mgf files. However I don't want to use every PTM where the model was trained on and exclude some. I changed the config file and tried to predict my data, but I am already getting this warning:

"UserWarning: Mismatching residues parameter in model checkpoint"

And I see that all the original PTMs occur in my output. So my question is whether it is possible to exclude some PTMs without retrain everything by scratch?

Thanks for your time and help

melihyilmaz commented 8 months ago

Hi Jannik,

There's no way to omit certain PTMs or amino acids by changing the config file and, as you noted, a new model would need to be trained from scratch with a subset of the modifications.

A hacky way of avoiding prediction of certain PTMs without retraining could be locally implementing a masking tensor similar to active_mask in _get_topk_beams() that would zero out probabilities for a subset of tokens but this is probably non-trivial.

Finally, I suspect inclusion of all default Casanovo PSMs may not hurt de novo sequencing performance much on your dataset based on our observations when developing the model, so maybe you can simply eliminate PTMs that you don't expect to see in the predictions or filter those predictions out entirely, if you haven't tried these already.

JannikSchneider12 commented 8 months ago

Hey,

Thank you for your answer and explanations. Then I will try to parse the output to only include my PTM subset