Open dtabb73 opened 3 years ago
In my view, this is "metadata" information on top of the actual protein sequence. In the current version of the specification, we decided to handle those issues using the INFO tag providing the metadata there as free text.
Standardise every single annotation at this point is unfeasible in my view.
I think all this is beyond the scope of ProForma 2.0. ProForma is designed to describe the molecule that (someone claims) yielded a spectrum. Information about:
Basically ProForma is about what you think you have observed, not about what you infer about context of that observation. (with a minor exception that there is some ambiguity between I/L and a few other isobaric ambiguiities where ProForma allows the user to express one, but it is implied that isobaric alternatives are possible)
I would like to see a paragraph in the specification indicating how proteoform sequence truncations are to be specified. N-terminal truncations may be biological, as in the removal of the initial Met (perhaps with PTM) or the cleavage of a signal peptide or the action of a viral protease. The truncations may be instead be related to sample treatment, such as a rare cutter like CNBr for middle-down proteomics or due to a "hot" ion source. I believe ProForma should specify how a proteoform sequence compares to the sequence described by the accession, such as indicating the position of the first and last amino acids in the accession's sequence. Are amino acids preceding and succeeding the proteoform sequence expected to be included?