compomics / searchgui

Highly adaptable common interface for proteomics search and de novo engines
http://compomics.github.io/projects/searchgui.html
40 stars 16 forks source link

Error when setting batch size with sage and lfq export #373

Open s-imonb opened 4 weeks ago

s-imonb commented 4 weeks ago

Hello,

When running searchgui (4.3.9) with sage and the batch size in the sage settings set I get an error in the formatting of the input string. With the batch size field empty it runs, but one file at a time.

Here’s the command from the searchgui log: C:\SearchGUI-4.1.0-windows\SearchGUI-4.3.9\resources\Sage\windows\sage.exe -o C:\SearchGUI-4.1.0-windows\SearchGUI-4.3.9\resources\temp\search_engines\sage -batch-size16 --disable-telemetry-i-dont-want-to-improve-sage C:\SearchGUI-4.1.0-windows\SearchGUI-4.3.9\resources\temp\search_engines\sage\sage.json

and here’s the error: error: unexpected argument '-b' found     tip: to pass '-b' as a value, use '-- -b'   Usage: sage.exe [OPTIONS] [mzml_paths]...   For more information, try '--help'.

Second, does peptide shaker save the lfq and other output files from sage? I can see the files generated in temp/search engines/sage but it’s not practical to grab them as they are generated.

Thanks! Simon

hbarsnes commented 4 weeks ago

Hi Simon,

Thanks for letting us know about the batch size issue! Basically we were using -batch-size and not --batch-size. There was also a missing white space between the option and the value. Both have now been fixed and I will try to find the time to release a new version of SearchGUI next week.

Second, does peptide shaker save the lfq and other output files from sage? I can see the files generated in temp/search engines/sage but it’s not practical to grab them as they are generated.

At the moment, SearchGUI does not keep the lfq or other additional output files from Sage, mainly as the quantification data (at least for LFQ) is also included in the Sage tsv output file. We may look into changing this in the future, but in the meantime it is probably easier to run the Sage command lines directly.

Best regards, Harald

s-imonb commented 3 weeks ago

Thanks for the quick reply Harald, I'm glad it looks like a simple fix.

For the LFQ, I think it only exports the MS2 intensities in the sage.tsv output, this isn't the most useful for quant. If I match a lfq.tsv output I grabbed from the temp folder with the sage.tsv I don't get the same values. It's not a big deal to run it at the command line but running the analysis via a GUI is nice to have. I have attached a couple of example files.

I'll keep an eye out for the update.

Best, Simon

On Fri, Aug 16, 2024 at 4:46 PM Harald Barsnes @.***> wrote:

Hi Simon,

Thanks for letting us know about the batch size issue! Basically we were using -batch-size and not --batch-size. There was also a missing white space between the option and the value. Both have now been fixed and I will try to find the time to release a new version of SearchGUI next week.

Second, does peptide shaker save the lfq and other output files from sage? I can see the files generated in temp/search engines/sage but it’s not practical to grab them as they are generated.

At the moment, SearchGUI does not keep the lfq or other additional output files from Sage, mainly as the quantification data (at least for LFQ) is also included in the Sage tsv output file. We may look into changing this in the future, but in the meantime it is probably easier to run the Sage command lines directly.

Best regards, Harald

— Reply to this email directly, view it on GitHub https://github.com/compomics/searchgui/issues/373#issuecomment-2294222007, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWISRSSFJM2AB432RLM2SA3ZRZQLNAVCNFSM6AAAAABMUFS7S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJUGIZDEMBQG4 . You are receiving this because you authored the thread.Message ID: @.***>

s-imonb commented 3 weeks ago

here they are.

Sample1.sage.txt lfq.txt

hbarsnes commented 1 week ago

SearchGUI v4.3.10 has just been released which solves the problem with the Sage batch size parameter.

I have not had the time to look into the question about keeping the additional Sage output files and will therefore keep the issue open. I cannot guarantee that I will be able to look at this in the near future though.

s-imonb commented 1 week ago

Hi Harald,

Thanks for the quick update (and for the great tool in general). I grabbed the latest version and I can confirm it no longer gives the batch size error. Plus I'm getting an output file (sage.tsv) for each sample whereas I was only getting one file before.

However, it's still not parallelizing as I would expect. I checked the sage.json and it looks like searchGUI is feeding SAGE 1 file at a time. It's entirely possible I have messed up a setting somewhere, I have the core count set to 15 in both the java settings and sage settings. Here's the relevant line in the json; I have 7 files selected in searchGUI:

... "mzml_paths": ["C:\path\to\file\blank_01.mzml"] }

Thanks, Simon

On Wed, Sep 4, 2024 at 6:05 AM Harald Barsnes @.***> wrote:

SearchGUI v4.3.10 has just been released which solves the problem with the Sage batch size parameter.

I have not had the time to look into the question about keeping the additional Sage output files and will therefore keep the issue open. I cannot guarantee that I will be able to look at this in the near future though.

— Reply to this email directly, view it on GitHub https://github.com/compomics/searchgui/issues/373#issuecomment-2328442397, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWISRSUAFYGNG7TLYBI5RI3ZU3LQBAVCNFSM6AAAAABMUFS7S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRYGQ2DEMZZG4 . You are receiving this because you authored the thread.Message ID: @.***>

hbarsnes commented 4 days ago

Hi Simon,

I'm afraid that's how SearchGUI has been implemented, i.e. search one spectrum file at the time. Basically, this was the only option supported by the search engines when we implemented the first version of SearchGUI a long time ago. So while some of the search engines may support multiple files as input it is not the case for all of them.

However, I'm not sure how much the gain would be in practice? It may even be faster to use all the resources on one file at the time? I've never tested this but sounds like something that would very much be up to whether the given search engine was optimized for parallel processing of multiple files or not?

Best regards, Harald