OpenMS / OpenMS

The codebase of the OpenMS project
https://www.openms.de
Other
478 stars 319 forks source link

Support "non-existing" amino acids BJXZ #7554

Closed fcyu closed 3 months ago

fcyu commented 3 months ago

There are cases that people use the "non-existing" amino acids to encode modified amino acids in certain peptides, such as SILAC-labeled Pierce iRT peptides: https://github.com/Nesvilab/FragPipe/issues/1673#issuecomment-2230833796. Although FragPipe works well from database searching to FDR filtering, the spectral library generation module, EasyPQP, crashes due to an error from OpenMS: RuntimeError: the value 'B' was used but is not valid; Modification '': origin must be a letter from A to Y, excluding B and J. : https://github.com/Nesvilab/FragPipe/issues/1673#issuecomment-2248996234

I am wondering if it is possible to support these four amino acids by extending the characters to all 26 letters.

Thanks,

Fengchao

timosachsenberg commented 3 months ago

Hi, thanks for reporting. Let me briefly elaborate: OpenMS by default ensures that residues have unique properties like mass/composition etc. This trickles down to many parts of the library. From the issue it first sounded as if an extended residue set is needed, which we in principle support (just for reference here: https://github.com/OpenMS/OpenMS/blob/develop/src/openms/include/OpenMS/CHEMISTRY/ResidueDB.h#L95C90-L96C8 but here B may be asparagine or aspartate, J may be isoleucine or leucine etc.. But this is of course not what you want, so I think adding a new residue set with B, J is not the right solution. What I would suggest instead is to explicitly use the OpenMS encoding for modified amino acids when it is called in easypqp. e.g., if there is a "PEPTIDEJ" passed to easypqp then use the Unimod modifications "Label:13C(6)" as in "PEPTIDEL(Label:13C(6))" to get the sequence with the modified amino-acid. Also works for delta masses if required... Do you know if that would be an option? If not, can you point me to the code where it fails and I can take a look. Best

fcyu commented 3 months ago

Thanks for the prompt and detailed answers. I agree that using B or J to represent a modified amino acid is not ideal. We also found that some of other tools also not support it well.

We will change MSFragger to support protein specified modifications in the future.

Best,

Fengchao