Open ypriverol opened 9 months ago
quick fix: adding the comet id here https://github.com/OpenMS/OpenMS/blob/develop/share/OpenMS/CHEMISTRY/Enzymes.xml#L142-L163 and adding the output here https://github.com/OpenMS/OpenMS/blob/develop/src/topp/CometAdapter.cpp#L567-L579
@ypriverol this would actually be a good entry level task for a student that wants to get into OpenMS/C++
For both lysc and multi enzymes you will need to give up consensus id compatibility then. A fix for Lysc is just a simple if-case logic in the workflow.
Multi enzymes is a large change in both openms and the workflow. Openms needs to support it in both the data structures and things like indexing. You don't only need support for multiple enzymes but also logic for if they were applied at the same time or after each other. It will probably also not be compatible with an own or a workflow generated decoy databases unless you run multiple searches with different enzymes (and generate one decoy database for each enzyme). You will need to use comet's decoy generation. Therefore it is probably easiest to run comet without the adapter and convert to idxml later on.
I agree with Julianus that properly modelling multienzyme digestion is adding a lot of complexity. One note: you often see Lys-C/Trypsin combination because it improves cutting after K. From a search engine perspective, the combination can just be treated as Trypsin (or even Trypsin/P) because Lys-C basically cuts at a subset of Trypsin cutting sites. So maybe such complexity is not needed?
Im trying to tackle here the first use case which is quite common, the use of another enzyme and not multi-enzyme. Then, it should be easy to extend OpenMS to extend enzymes and support them.
We could make this workaround for this special case on the workflow level by allowing multi enzymes on workflow level only. Then you would see trypsin/lys-c in the workflow reports and trypsin as far as OpenMS is concerned.
Or we start by adding this special case to OpenMS. (Introducing a new mix enzyme). This would be mainly for reporting reasons then.
I don't know why you want to do the mix enzyme. The problem is actually much simpler. We have Lys-C/P which in fact is supported by comet but the Adapter in OpenMS doesn't support it. I want to support it in OpenMS in order to be able to process the dataset that used only Lys-C/P with msgf+ and comet. No mix enzymes.
Ah ok I completely misread the issue then haha
Doing Arg-C and Lys-C before trypsin is not an issue.
<ITEM name="RegExDescription" value="Arg-C cuts after R residue unless the next residue is P." type="string" />
<ITEM name="RegExDescription" value="Lys-C cuts after K if not followed by P." type="string" />
but Glu-C as listen on: https://www.ebi.ac.uk/pride/archive/projects/PXD005200 is an issue. It cleaves mainly after E but also after D
I don't know why you want to do the mix enzyme. The problem is actually much simpler. We have Lys-C/P which in fact is supported by comet but the Adapter in OpenMS doesn't support it. I want to support it in OpenMS in order to be able to process the dataset that used only Lys-C/P with msgf+ and comet. No mix enzymes.
Ok that should be easy. You mean similar to https://github.com/OpenMS/OpenMS/pull/7422/files
PR request in OpenMS https://github.com/OpenMS/OpenMS/pull/7584
Description of feature
Currently, @timosachsenberg @jpfeuffer comet only support 'Asp-N,Chymotrypsin,CNBr,no cleavage,unspecific cleavage,Trypsin,Arg-C,Lys-C,Lys-N,PepsinA,Trypsin/P,glutamyl endopeptidase' However comet has a way to pass a definition of more enzymes https://uwpr.github.io/Comet/parameters/parameters_202301/search_enzyme_number.html using a parameter file. How can we use that possibility to define for example
Lys-C/P
currently Lys-C will not work because msgf+ processor change it to Lys-C/P and comet do not supported it.