Rappsilber-Laboratory / XiSearch

XiSearch
Apache License 2.0
9 stars 7 forks source link

Minimum peptide length issue #71

Closed cxdummies closed 1 year ago

cxdummies commented 1 year ago

I wanted to identify pepitdes of minimum length of 4 amino acids, so I changed the search setting to

MINIMUM_PEPTIDE_LENGTH:4

The peptides I could identify still had a minimum peptide length of 6. I already removed the # pound sign preceeding the line.

What is the default minimum peptide length? Is it 2 (as written in the setting annotation) or 6?

Could you please let me know how to address this issue?

Thank you.

lutzfischer commented 1 year ago

xiSEARCH is probably not your problem here. By default it should try to identify also these short peptides as well.

These peptides get filtered out by xiFDR. The reason for that is, that for very short peptides the target decoy equivalence can fall over. When xiSEARCH digests the decoy proteins it checks, if it has a target peptide with the same sequence. If one is found, then the decoy sequence gets permuted to find one that is not yet present as target. But for very short peptides and largish databases this will not alwas succeed. In effect, when calculating an FDR involving very short peptides the calculated FDR can underestimate the actual error.

If you only search a few proteins that is probably not a problem - but if you search a few hundred or thousand proteins this could negatively affect accuracy of the FDR calculation. Also very short peptides have a higher chance to be randomly matched and even if the FDR is correct, it will negatively affect the overall result; meaning that you get more results passing a given FDR, if these get filtered out.

With all that said, if you want to include them, then you have to run xiFDR in GUI mode and on the FDR Settings tab select complete FDR. There you can change the setting for Min Pep. Length to something lower.

cxdummies commented 1 year ago

That works!

I did search only a few proteins. I hade to reduce the peptide length, because I knew a particular peptide of 4 amino acids had to be crosslinked. Indeed, the crosslinks involving this peptide turned out to have the highest scores.

Do you think lowering FDR setting at residual pairs would help to offset the increased number of random matches? Could you suggest some changes to the settings so that the overall results would not be affected too much due to a shorter peptide length?

Many thanks.

lutzfischer commented 1 year ago

Do you think lowering FDR setting at residual pairs would help to offset the increased number of random matches?

I am not sure I really understand you here. There are two general problems with short peptides, one being the target - decoy equivalence. This one should have relatively little impact here, as you have few proteins and even with 4 amino acids xi should be able to generate enough unique short decoy peptides. The second one is that you have higher random match chance and (if we ignore the first problem) in result fewer matches will pass at a given FDR.

Given your FDR result how much of an impact the short peptides will have largely depends on how many of these actually make it through. If only one or two among a few hundred are short - then there is no problem at all. If these dominate your result the situation would look different.

Could you suggest some changes to the settings so that the overall results would not be affected too much due to a shorter peptide length?

xiFDR has the ability to treat these separate on level of CSMs and peptide pairs - depending of how many of these short peptides are there this might help (FDR Settings-> Define Groups -> Peptide Length). Defining a group of 5 or 6 (meaning everything shorter then 5 or 6 is treated separate) and then defining a 5% or 10% peptide pair FDR additionally to the residue pair FDR might include these in a more sensible way. On level of residue pairs that can still have a rather negative impact - can but does not have to. BUT you would need to have sufficient short peptide matches to make a separate FDR for these - otherwise your FDR on these means next to nothing.

So as long as you have only a few proteins and unless you have a lot of short peptide matches I would just try to reduce the minimum peptide length and see if that gives more interesting links and don't worry overly much about the total number of links.

Most other methods, that I could see, leave you basically open to an unsure error in the end.

cxdummies commented 1 year ago

Could you explain the difference between reduced and complete FDR? Could you recommend a paper that describes xiFDR in detail? Many thanks!

lutzfischer commented 1 year ago

The difference is just that in complete FDR you see all the options. Meaning you can change a lot more settings. All the settings are used in the reduced version as well - just not shown. There is no up to date paper that describes all the options - but if you hover with the mouse over an input field it should give you a short description of what it does.