tide: Wrote peptides to temp files takes extremely long time

compomics / searchgui

Highly adaptable common interface for proteomics search and de novo engines

http://compomics.github.io/projects/searchgui.html

38 stars 16 forks source link

tide: Wrote peptides to temp files takes extremely long time #346

Closed KKKKK-tech closed 1 year ago

KKKKK-tech commented 1 year ago

Hi, When I used tide before, this step only took about ten minutes, but now it has been running for five days and it's still not finished, may I ask what is the reason? (.html is not supported, so I turn it into .docx) SearchGUI Report 2023-03-06 11.15.50.docx

hbarsnes commented 1 year ago

Can you share the SearchGUI log file as well? You can find it via the Help > Bug Report menu option.

Are you using the same database and search settings as when this step took ten minutes? Because if the size of the database is much larger or, for example, the number of variable PTMs is much higher, the Tide indexing will take much longer to complete and you may even in some cases run out of disk space.

KKKKK-tech commented 1 year ago

Sure, here is the log file. SearchGUI 4.2.2 log.txt The database is the same, and I notice that the maximum number of variable PTM per peptide in the parameter is 255, but this is the default value, so is that the reason?

hbarsnes commented 1 year ago

The database is the same, and I notice that the maximum number of variable PTM per peptide in the parameter is 255, but this is the default value, so is that the reason?

I don't think so as I'm pretty sure you will not have any peptides that are able to have that many PTMs anyway.

I do however notice that you have non-default values for the max peptide length (100) and the max peptide mass (10 000). I guess that especially the first one, where the default value is 30, could have a big impact on the time it takes Tide to create its index? Perhaps you can try with the default settings and see if that makes a difference?

KKKKK-tech commented 1 year ago

I've tried with the default settings, but very few proteins have been identified. Here's the result of peptideshaker. BTW, I want to get as much protein as possible for some purposes, and I found that the default parameter of max peptide length of pfind, another protein identification software, is 100, so I also set 100 in tide. QQ图片20230307103501

hbarsnes commented 1 year ago

BTW, I want to get as much protein as possible for some purposes, and I found that the default parameter of max peptide length of pfind, another protein identification software, is 100, so I also set 100 in tide.

The mass spectrometer usually works best for peptides much shorter than 100 amino acids. Furthermore, when using trypsin you will find it very uncommon to not have any cleavage sites for such long stretches of amino acids. Hence going up to 100 in max peptide length will cost you a lot in processing time for very little, if any, gain. Therefore we usually only look for peptides between 8 and 30 amino acids.

You may also have a closer look at your other search settings and see if any of them ought to be changed. For example, for TMT data we usually use a precursor tolerance of 10 ppm and a fragment ion tolerance at around 0.01 Da. Perhaps you can try that and see if it makes a difference?

Do you see similar results for all of the search engines?

KKKKK-tech commented 1 year ago

For example, for TMT data we usually use a precursor tolerance of 10 ppm and a fragment ion tolerance at around 0.01 Da

Thanks a lot, it works!

Do you see similar results for all of the search engines?

No, when I set precursor tol. and fragment ion tol. at 15ppm, all of the engines can finish normally, except tide. (And myrimatch will be interrupted while running in peptideshaker)

hbarsnes commented 1 year ago

Thanks a lot, it works!

Great! I will then close the issue.