Nesvilab / FragPipe

A cross-platform proteomics data analysis suite
http://fragpipe.nesvilab.org
Other
208 stars 38 forks source link

Very long times in TimsTOF data + diaPASEF/diaTracer on semi-specific + variable modification searches (compared to Astral) #1905

Open MiguelCos opened 2 days ago

MiguelCos commented 2 days ago

Dear Fragpipe team,

We are comparing identification and quantitation numbers in spectral data acquired in DIA mode from TimsTOF Ultra and Thermo Astral.

Samples are N-terminally enriched extracts from Arabidopsis using a HUNTER protocol (protein extracts are treated with dimethyl to label N-termini as potential products of proteolysis and further enrichment of 'labelled' N-terminal peptides).

Searches on this data are expected to have a big search space: they need to be set at N-term semi specific, and normally we would set N-terminal peptide dimethyl and acetyl, K-dimethyl and pyro-Glu as variable modifications.

I am currently running two test searches:

MSFragger parameters are the same for both (summary as described above); but the difference in search times is quite important.

Other notes: mzML files generated by diaTracer (timsTOF) are located in a local hard-drive, while the Astral .raw files are located in a network drive.

Do you have particular experience with semi-specific or other large search spaces and timsTOF + IM data?

What could be increasing the search space so much in the first search when the number of fragments seems to be similar between both approaches? (maybe I am missing something). Also, the size of the .raw files is considerably larger than the diaTracer-generated mzMLs from the .d timsTOF data.

I would be glad to have some of your thoughts on this and maybe some ideas to improve search efficiency.

As usual, thanks for the hard work on developing and maintaining FragPipe!

Best wishes, Miguel

fcyu commented 2 days ago

Hi Miguel,

If the size of the diatracer.mzML file is smaller than the .raw file, I think something was wrong with your local hard drive or file system because the file loading time was super long, and becoming longer and longer:

  1. HUNTER1Tryp500pgDIA22min_Slot2-27_1_4017_diatracer.mzML 1630.1 s | deisotoping 2.0 s [progress: 107395/107395 (100%) - 6311 spectra/s] 17.0s | postprocessing 1.1 s
  2. HUNTER2Tryp500pgDIA22min_Slot2-27_1_4018_diatracer.mzML 6472.0 s | deisotoping 2.8 s [progress: 185155/185155 (100%) - 1985 spectra/s] 93.3s | postprocessing 75.0 s
  3. HUNTER3Tryp500pgDIA22min_Slot2-27_1_4019_diatracer.mzML 1682.7 s | deisotoping 0.8 s [progress: 110425/110425 (100%) - 5054 spectra/s] 21.9s | postprocessing 13.9 s
  4. HUNTER1Tryp5ngDIA22min_Slot2-26_1_4005_diatracer.mzML 40657.2 s | deisotoping 9.7 s [progress: 426829/426829 (100%) - 6506 spectra/s] 65.6s | postprocessing 14.9 s
  5. HUNTER2Tryp5ngDIA22min_Slot2-26_1_4006_diatracer.mzML 43213.4 s | deisotoping 5.6 s [progress: 467150/467150 (100%) - 6990 spectra/s] 66.8s | postprocessing 9.5 s
  6. HUNTER3Tryp5ngDIA22min_Slot2-26_1_4007_diatracer.mzML 48608.2 s | deisotoping 4.2 s [progress: 453708/453708 (100%) - 7226 spectra/s] 62.8s | postprocessing 5.1 s

While for the raw files, the issue is not so critical although the IO was also slow

  1. 240131_Demo_Huesgen_Nterm_250ng_3Da_5ms_30min_rep01.raw 1335.8 s | deisotoping 20.5 s [progress: 288290/288290 (100%) - 3005 spectra/s] 95.9s | postprocessing 6.2 s
  2. 240131_Demo_Huesgen_Nterm_250ng_3Da_5ms_30min_rep02.raw 1304.6 s | deisotoping 19.1 s [progress: 288263/288263 (100%) - 3062 spectra/s] 94.1s | postprocessing 4.5 s
  3. 240131_Demo_Huesgen_Nterm_250ng_3Da_5ms_30min_rep03.raw 1288.8 s | deisotoping 20.2 s [progress: 288148/288148 (100%) - 3251 spectra/s] 88.6s | postprocessing 3.7 s
  4. 240131_Demo_Huesgen_Nterm_3Th_7ms_25ng_30min_rep01.raw 595.5 s | deisotoping 8.6 s [progress: 222758/222758 (100%) - 3614 spectra/s] 61.6s | postprocessing 3.2 s
  5. 240131_Demo_Huesgen_Nterm_3Th_7ms_25ng_30min_rep02.raw 642.8 s | deisotoping 9.1 s [progress: 222801/222801 (100%) - 3988 spectra/s] 55.9s | postprocessing 3.8 s
  6. 240131_Demo_Huesgen_Nterm_3Th_7ms_25ng_30min_rep03.raw 638.5 s | deisotoping 9.1 s [progress: 222737/222737 (100%) - 4004 spectra/s] 55.6s | postprocessing 4.3 s

I think the speed difference is not because of the search space, but the file loading.

Best,

Fengchao

MiguelCos commented 2 days ago

Hello Fengchao,

Thanks for your comment. That makes sense.

I will try running the search from different drives and see the effect.

Best wishes, Miguel