compomics / searchgui

Highly adaptable common interface for proteomics search and de novo engines
http://compomics.github.io/projects/searchgui.html
38 stars 16 forks source link

support for sage variable and fixed modifications by mass #369

Closed kostrouc closed 4 months ago

kostrouc commented 5 months ago

Hi,

I am wanting to perform a sage search using hundreds of variable modifications which sage has recently implemented. The problem I am having is that sage does not support running from a temporary directory. I run out of RAM very quickly and cannot complete a search with this many modifications. Utilizing searchcli allows the implementation of a temporary directory which solves part of my problem. Could the modifications in the .par file for searchcli be listed by mass instead of unique name as is part of the sage json format?

Sage json format specifies fixed and variable mods like this:

 "static_mods": {
      "C": 57.0215
    },
    "variable_mods": {
      "S": [203.079373, 406.158746],
      "[": [568.21157, 730.264394, 349.137282],
      "D": [37.94694080186, 52.91146120379, 14.01565006414],
      "E": [14.01565006414],
      "K": [42.0105646837, 28.03130012828, 162.0528234185, 100.01604398696, 42.04695019242, 114.04292744114],
      "M": [15.9949, 15.99491461956],
      "R": [28.03130012828],
      "S": [365.132196, 203.07937251951, 14.01565006414, 79.9663305207499],
      "T": [365.13219593801, 203.07937251951]
    }

Would it be possible to specify them in this way using the -fixed_mods and -variable_mods parameter?

Thank you!

hbarsnes commented 4 months ago

Given that SearchCLI is simply a wrapper around the Sage command line, I'm surprised that you cannot use the Sage command line directly and achieve the same result? Have you tried running the same Sage command line outside of SearchCLI? Or is there something I'm missing here?

With regards to the PTMs, the SearchGUI/SearchCLI implementation for Sage currently only supports one PTM per residue/terminal. Extending this has been on our TODO list for a while though. I will try to get some time to look into it later this week and let you know.

kostrouc commented 4 months ago

When I use sage directly, the job is killed due to running out of RAM and sage mentions that they have not implemented database splitting: https://github.com/bigbio/quantms/issues/327

Sage implemented the option for multiple modifications in May of 2023: https://github.com/lazear/sage/issues/48

hbarsnes commented 4 months ago

When I use sage directly, the job is killed due to running out of RAM and sage mentions that they have not implemented database splitting: https://github.com/bigbio/quantms/issues/327

I'm afraid that you will most likely see the same issue when running Sage via SearchCLI/SearchGUI. The only reason why that may not be the case in the current release is that we only support one PTM per residue when writing the Sage json file, hence Sage is not searching for all of the PTMs inserted. This has however now been implemented and a new version of SearchGUI will be released shortly.

In any case, if you want to search for hundreds of variable modifications you are probably better off using an open modification search, for example MetaMorpheus.

hbarsnes commented 4 months ago

SearchGUI v4.3.6 has now been released adding support for multiple modifications per residue/terminal for Sage. As mentioned above I do not think it will make a difference compared to running the standalone version of Sage though, but at least the modifications should now be correctly annotated in the Sage json file.

lazear commented 4 months ago

@kostrouc You are going to have a very bad time running that search, regardless of the amount of RAM - fragment indexing requires generating all of those peptides & fragments ahead of time, which will explode the search space and dramatically impede searching & rescoring. I would suggest running several independent searches with subsets of those modifications to see if you actually are getting IDs for them - you could also combine results from those sub-searches (e.g. taking best posterior_error per spectrum). Alternatively, shoot me an email - I might have a workaround

kostrouc commented 4 months ago

Does the open-search with wide window require the var_mods it searches for to be specified in the json? How then does this differ from the original search?

I am finding that the example open-search json only finds the 1 var_mod and static_mod specified, but when I add a couple PTMs to the var_mods in this example json, it then finds the others.

lazear commented 4 months ago

Open search will not annotate any modifications in the peptide sequence (unless specified as var mods) - instead, you need to compute a new column (expmass - calcmass) which will contain delta mass offsets. You could then filter for all PSMs with delta mass offsets within a tolerance of your list of modifications.