Closed iquasere closed 3 years ago
Trying with the Windows GUI the results are the same. X!Tandem finishes, and can build several models (in the example above it built 0 models, but it was fine with other files) and Myri-match and MS-GF+ had the same problems. I was trying to use these versions because they are the most recent available through Bioconda, and I had problems using SearchGUI 4 and Peptide-Shaker 2 through conda
Myri-match claims there is a duplicated protein id in the database, which doesn't make sense
I'm afraid MyriMatch has it's own internal FASTA header parsing which we cannot control. This is usually not an issue, but in your example it seems to assume that "WP_100909616.1" is the accession number. I think that the only way around this would be to reformat your FASTA headers to make MyriMatch happy. Ideally using our non-standard FASTA format.
MS-GF+ claims to run out of memory when creating suffixes
MS-GF+ uses the same memory settings as the ones given to SearchGUI. So if you are using the command line you have to add the -Xmx option to give Java, and consequently MS-GF+, more memory. In the GUI version of SearchGUI you can increase the memory provided via Edit > Java Settings.
I guess 1Gb is insufficient for MS-GF+ because of the size of the database (930648 sequences). In the parameters file there is nothing allowing to set more memory, where could I tweak this through SearchCLI?
In the parameters file there is nothing allowing to set more memory, where could I tweak this through SearchCLI?
As mentioned above you simply have to add the standard -Xmx Java option to your SearchCLI command line, e.g.:
java -Xmx2048M -cp SearchGUI-X.Y.Z.jar eu.isas.searchgui.cmd.SearchCLI [parameters]
Ok, I see it now. The symlink of running searchgui
from Bioconda uses 4 Gb of memory, but when running SearchCLI it likely defaults to 1 Gb. And this value can only be changed by not using the symlink, and calling the script directly
~/anaconda3/envs/proteomics/bin/java -splash:resources/conf/searchgui-splash.png -Xms128M -Xmx4096M -cp ~/anaconda3/envs/proteomics/share/searchgui-3.3.9-1/SearchGUI-3.3.9.jar eu.isas.searchgui.cmd.SearchCLI -spectrum_files metaproteomics/test -output_folder metaproteomics -id_params metaproteomics/params.par -threads 14 -xtandem 1 -myrimatch 1 -msgf 1
uses 4 Gb, but
searchgui eu.isas.searchgui.cmd.SearchCLI -spectrum_files metaproteomics/test -output_folder metaproteomics -id_params metaproteomics/params.par -threads 14 -xtandem 1 -myrimatch 1 -msgf 1
will always use 1 Gb. If I'm not mistaken, I cannot use the symlink and a different memory. If so, this could be a parameter for a future version - if it isn't already!
I'm not familiar with the conda setup myself, but you can verify how much memory is given to MS-GF+ by checking the SearchGUI log file where you will see the exact MS-GF+ command line used.
I will check with the developer in charge of the conda setup and get back to you.
It seems like all you have to do is add the Xmx option there as well, i.e.
searchgui eu.isas.searchgui.cmd.SearchCLI -Xmx4096M -spectrum_files [...]
You are right, sorry for the hassle xD MS-GF+ works perfectly that way. And for Myri-match, gonna have to shape those IDs. Thank you very much for the assistance!
I am trying to perform PSM with SearchCLI. The input are peak-picked MGF datasets, and I am trying to use Myri-match, X!Tandem and MS-GF+. Both Myri-match and MS-GF+ fail, with X!Tandem managing to finish the matching.
Myri-match claims there is a duplicated protein id in the database, which doesn't make sense:
grep 'WP_100909616.1' /mnt/HDDStorage/jsequeira/metaproteomics/database_concatenated_target_decoy.fasta
givesMS-GF+ claims to run out of memory when creating suffixes:
SearchGUI version:
3.3.9
; Java version:openjdk 11.0.1
This is the full log for one file.