Noble-Lab / casanovo

De Novo Mass Spectrometry Peptide Sequencing with a Transformer Model
https://casanovo.readthedocs.io
Apache License 2.0
102 stars 35 forks source link

Add brackets and hyphens to PTMs #201

Open wsnoble opened 1 year ago

wsnoble commented 1 year ago

I think we should modify how Casanovo reports PTMs in its predicted sequences. Ideally, we would be compatible with the Proforma specification:

https://github.com/HUPO-PSI/ProForma/blob/master/SpecDocument/ProForma_v2_draft15_February2022.pdf

But even if we don't adopt unimod IDs in our spec, we should at least make our formatting compatible with the mass difference approach. The only change is that each modification mass is enclosed in square brackets, and N-term and C-term mods are followed or preceded by a hyphen. So for example, -17.027C+57.021PSEC+57.021TC+57.021LDTVVR gets turned into [-17.027]-C[+57.021]PSEC[+57.021]TC[+57.021]LDTVVR.

wfondrie commented 1 year ago

One of the updates I made in the recent depthcharge update is first-class support for proforma.

Even mskb-style mods are now handled as proforma internally.

wsnoble commented 1 year ago

Does that mean we can close this issue? I.e., will this get handled automatically once the new depthcharge changes are in place, or are additional changes on the casanovo side going to be necessary?

wfondrie commented 1 year ago

Yes, but let's leave the issue open and close it with that upgrade.

bittremieux commented 1 year ago

Note though that according to the official mzTab specification, the sequence column in the output file should be the unmodified sequence, and modifications should be reported in a modification column (page 15–16). This is because mzTab predates ProForma. We currently violate this part of the spec as well because it would be annoying to retokenize the predicted peptide sequences.

wsnoble commented 1 year ago

Comet (and Tide) solves this by producing three columns: one with the raw sequence, one with the sequence decorated with mods, and one with just the mods. I think this might be a nice thing to do in Casanovo.