Open wsnoble opened 1 year ago
One of the updates I made in the recent depthcharge update is first-class support for proforma.
Even mskb-style mods are now handled as proforma internally.
Does that mean we can close this issue? I.e., will this get handled automatically once the new depthcharge changes are in place, or are additional changes on the casanovo side going to be necessary?
Yes, but let's leave the issue open and close it with that upgrade.
Note though that according to the official mzTab specification, the sequence
column in the output file should be the unmodified sequence, and modifications should be reported in a modification
column (page 15–16). This is because mzTab predates ProForma. We currently violate this part of the spec as well because it would be annoying to retokenize the predicted peptide sequences.
Comet (and Tide) solves this by producing three columns: one with the raw sequence, one with the sequence decorated with mods, and one with just the mods. I think this might be a nice thing to do in Casanovo.
I think we should modify how Casanovo reports PTMs in its predicted sequences. Ideally, we would be compatible with the Proforma specification:
https://github.com/HUPO-PSI/ProForma/blob/master/SpecDocument/ProForma_v2_draft15_February2022.pdf
But even if we don't adopt unimod IDs in our spec, we should at least make our formatting compatible with the mass difference approach. The only change is that each modification mass is enclosed in square brackets, and N-term and C-term mods are followed or preceded by a hyphen. So for example,
-17.027C+57.021PSEC+57.021TC+57.021LDTVVR
gets turned into[-17.027]-C[+57.021]PSEC[+57.021]TC[+57.021]LDTVVR
.