compomics / searchgui

Highly adaptable common interface for proteomics search and de novo engines
http://compomics.github.io/projects/searchgui.html
40 stars 16 forks source link

sage is really slow and peptide shaker won't open the output #343

Closed justmpm closed 1 year ago

justmpm commented 1 year ago

Hi, I know this is a beta, so I am not complaining, I just wanted to let you know about my experience running sage in search GUI and Peptide Shaker and help make a great too even better!

I had a lot of trouble getting sage to work in search GUI, it seemed no matter how I set the settings, that the search would overwhelm the system resources and take forever (>90 minutes) to finish and then the output wouldn't open in Peptide Shaker. If I take the sage.json file generated by search GUI and run it on a stand-alone install of sage, it finished in ~4 minutes and created the same output as sage in Search GUI. There were some errors when running from the command line and I have attached a zip with the cmd prompt output, the Peptide Shaker log, the tsv output and the Search GUI report. Some of the columns, such as those reporting FDR are the same for all the matches. If I run with the same data in SearchGUI, with X!tandem or some of the other engines, I don't have any issues with the system resources, run time, or compatibility with Peptide Shaker.

In case it matters: Search GUI is running on a windows machine with a 4 core/8 thread cpu and 16 GB of RAM. The data is from an agilent QTOF (about 14,000 spectra) as an mzML made with msconvert to be compatible with tpp/msfragger. Search GUI is v4.2.1 Peptide Shaker is v2.2.19 stand-alone sage is v0.8.0 and java is v 8 update 361.

Please let me know if you need anything else!

Thanks!

Cheers Mike SAGEissues.zip

hbarsnes commented 1 year ago

Hi Mike,

Thanks for testing and for the detailed feedback! I will look into this right away and get back to you.

Best regards, Harald

hbarsnes commented 1 year ago

This is all very strange! As all SearchGUI is doing is creating the Sage command line and running it. Can you double check that the Sage command line executed in SearchGUI is identical to the one you tried directly from the terminal? You'll find it in the SearchGUI-4.2.1\resources\SearchGUI.log file.

With regards to the json content, I spot the following four differences:

=1=

SearchGUIreport.txt:

  "output_paths": [
    "C:\\SearchGUI-4.1.24-windows\\SearchGUI-4.2.1\\resources\\temp\\search_engines\\sage\\results.sage.tsv",
    "C:\\SearchGUI-4.1.24-windows\\SearchGUI-4.2.1\\resources\\temp\\search_engines\\sage\\results.json"
  ]

vs.

cmdpromptoutput.txt:

  "output_paths": [
    "C:\\sage\\sage-v0.8.0-x86\\results.sage.tsv",
    "C:\\sage\\sage-v0.8.0-x86\\results.json"
  ]

Don't see why this one should matter though?

=2=

SearchGUIreport.txt: "parallel": true, vs.

cmdpromptoutput.txt: "parallel": false,

Could make a difference, but if anything "parallel" should be quicker?

=3=

SearchGUIreport.txt: "lfq": null

vs.

cmdpromptoutput.txt: "lfq": true

Again, the second option should potentially be quicker?

=4=

SearchGUIreport.txt:

    "variable_mods": {
      "T": 79.96633,
      "M": 15.994915,
      "N": 0.9840156,
      "K": 114.04293,
      "Q": 0.9840156,
      "S": 79.96633
    },

vs.

cmdpromptoutput.txt:

    "variable_mods": {
      "K": 114.04293,
      "Q": 0.9840156,
      "M": 15.994915,
      "T": 79.96633,
      "S": 79.96633,
      "N": 0.9840156
    },

The order is different, but not sure why that would matter?

Can you try changing these parameters one by one and see if you can find the one(s) that potentially results in the different run times?

justmpm commented 1 year ago

Hi, I was able to run it fine from the command line and none of the other changes seemed to make a difference by themselves or in tandem with each other. There is an issue somewhere though, I can get 3 or maybe 4 successive searches to run using search gui and sage but then it hits some kind of snag and starts freezing up. Sometimes it freezes up for a few minutes and then finishes, and sometimes in just completely freezes up. It freezes up once it reaches 99% memory use (in task manager). It is super strange, because it will freeze up at 99% but there does not seem to be enough memory allocated to the sage processes to result in the problem. Sometimes it resolves by itself and the the process finishes, sometimes it doesn't resolve itself, and the computer needs to be rebooted. I attached some pics of the task manager, once in the middle of a run that finished, and two from when it froze up.

There doesn't seem to be a way to limit the system resources used by sage, I tried changing the bit bucket in the settings (between 6 Gb and 32 gb), but this doesn't change anything as far as memory usage during the analysis goes. Do you know is there error logging turned on in search gui or sage?

Also, when sage successfully finishes, peptide shaker will not automatically open the results, but they can be manually imported, but only by themselves and not with results from another search engine. It looks like there is something wrong with the statistics, because peptide shaker labels all the matches from sage as not significant.

Cheers Mike

TaskManagerPics.zip

justmpm commented 1 year ago

Hi, I am able to get sage running by limiting the size of the peptides to <30 a.a. in length and by changing the precursor error from 25 ppm to 15 ppm. So far this is keeping it from using all the RAM and freezing up and it is running quite fast at around 3 minutes.

I also noticed a java "failed to start" error, so I got rid of all the '-' and '.' in the paths, but there is still an issue getting Peptide Shaker to automatically load sage data from search gui. I am able to get stats back from sage, the problem is in the sage settings in search gui, the default tag for decoys is 'rev_' but search gui generates '_REVERSED' as the tag. Using '_REVERSED' as the decoy tag in the sage settings sorted out the problem with the statistics.

Thanks for looking into this, but I think it is just an issue with sage needing a lot of system resources if the search is run too aggressively on systems without enough RAM.

Cheers Mike

hbarsnes commented 1 year ago

Hi Mike,

Thanks again for the detailed testing! I've been stuck in meetings all day, but finally had some time to look into the remaining issues.

Great that you were able to sort out the problem with the decoy tags! I was wondering what had happened to the Sage scores. I guess I was thinking that the Sage decoy tag setting did not matter when the Sage option to generate decoys was turned off, but I was clearly wrong. It has now been changed to use the same default as in SearchGUI and a new SearchGUI version will be released shortly.

The issue with opening the Sage results directly from SearchGUI in PeptideShaker has also bee fixed. I had simply forgotten to add the Sage output file type to the list of support files types in the command line version of PeptideShaker. I will therefore release a new version of PeptideShaker as well. Hopefully tomorrow.

Best regards, Harald

justmpm commented 1 year ago

Hi Harald, Thanks for all your help, I am glad all the issues with either my own problems or easy fixes! I think the LFQ output from sage could be really useful in general, and I was wondering if you were planning on implementing some LFQ analysis in PeptideShaker? When I get sometime I will compare the sage output to the LFQ in MetaMorpheus, but as I remember, it is not implemented in SearchGui. Also, right now, I can either run multiple spectra files as if they were fractions, or load them one by one in searchgui to produce seperate output files, it would be great if I could load a bunch of files and have SearchGUI run them iteratively one after the next without any user intervention.

Thanks again!

Cheers Mike

hbarsnes commented 1 year ago

Hi Mike,

I think the LFQ output from sage could be really useful in general, and I was wondering if you were planning on implementing some LFQ analysis in PeptideShaker?

That would indeed be very useful. I'll see what I can do. No guarantees though. Focusing on MS3 TMT-support at the moment though (see https://github.com/compomics/reporter)

I can either run multiple spectra files as if they were fractions, or load them one by one in searchgui to produce seperate output files, it would be great if I could load a bunch of files and have SearchGUI run them iteratively one after the next without any user intervention.

For the moment this is indeed not supported, but you can always do this via the command lines. See https://github.com/compomics/searchgui/wiki/SearchCLI and https://github.com/compomics/peptide-shaker/wiki/PeptideShakerCLI.

I can look into implementing a basic support for this in the graphical user interface as well. The real challenge is in any case to properly compare the results across multiple files/projects. This has been on our todo-list for a very long time, but we never seem to get the time and resources to implement it...

Best regards, Harald

hbarsnes commented 1 year ago

Hi again Mike,

I just released new versions of both SearchGUI and PeptideShaker which should fix both the issue with the Sage decoy tag (now using the same tag as the one in the FASTA parsing parameters) and the issue with opening the search results directly in PeptideShaker.

Unless you disagree (or the issues are still not solved in the new releases), I would recommend that we close this particular issue?

I will also urge you to set up separate issues for the several new features/enhancements you have suggested as part of this discussion (some of which fit better under the PeptideShaker issue tracker)? As I'm afraid that otherwise they may easily be forgotten.

Best regards, Harald