MSGFPlus / msgfplus

MS-GF+ (aka MSGF+ or MSGFPlus) performs peptide identification by scoring MS/MS spectra against peptides derived from a protein sequence database.
Other
73 stars 36 forks source link

java.lang.NullPointerException when running MS-GF+ #13

Open RiegardtJohnson opened 7 years ago

RiegardtJohnson commented 7 years ago

I ran an MS-GF+(v2017.01.13) search using SearchGUI, and received the following errors when the output files were being generated:

Writing results... java.lang.NullPointerException at edu.ucsd.msjava.mzid.MZIdentMLGen.getDBSequence(MZIdentMLGen.java:661) at edu.ucsd.msjava.mzid.MZIdentMLGen.getPeptideEvidenceList(MZIdentMLGen.java:619) at edu.ucsd.msjava.mzid.MZIdentMLGen.addSpectrumIdentificationResults(MZIdentMLGen.java:347) at edu.ucsd.msjava.ui.MSGFPlus.runMSGFPlus(MSGFPlus.java:397) at edu.ucsd.msjava.ui.MSGFPlus.runMSGFPlus(MSGFPlus.java:106) at edu.ucsd.msjava.ui.MSGFPlus.main(MSGFPlus.java:57)

The search finishes without any errors, however no output .mzid files are generated. The command used to run the search was as follows: ms-gf+ command: /home/user/anaconda2/jre/bin/java -Xmx50g -jar /run/media/user/Data/rmj_proteomics/SearchGUI-3.2.18/resources/MS-GF+/MSGFPlus.jar -s /run/media/user/Data/rmj_proteomics/proteomics/RECONVERTED/RJ_FC2_DCE.mgf -d /run/media/user/Data/rmj_proteomics/TREMBL_database/nr_fungal/nr_fungal_concatenated_target_decoy.fasta -o /run/media/user/Data/rmj_proteomics/proteomics/nr_fungal_lin/.SearchGUI_temp/RJ_FC2_DCE.msgf.mzid -t 10.0ppm -tda 0 -mod /run/media/user/Data/rmj_proteomics/SearchGUI-3.2.18/resources/MS-GF+/params/Mods.txt -minCharge 2 -maxCharge 6 -inst 3 -thread 23 -m 3 -e 1 -ntt 2 -protocol 0 -minLength 8 -maxLength 45 -n 10 -addFeatures 0 -ti 0,4

Can you advise on how to resolve this error?

Kind regards, Riegardt Johnson

alchemistmatt commented 7 years ago

That's useful information that you provided, but it's not enough for us to solve the problem. It may be related to the protein names or protein sequences in the FASTA file, but without the actual files, we won't be able to diagnose. Please send SearchGUI-3.2.18/resources/MS-GF+/params/Mods.txt along with a portion of the .mgf file (e.g. a sampling of 25 spectra from the middle of the scan range) to proteomics@pnnl.gov

alchemistmatt commented 7 years ago

Also, please provide us info on where you obtained the TREMBL nr_fungal FASTA file. It would also be helpful if you sent us a portion of your FASTA file, including both the normal proteins and the decoy proteins that you added. This will let us see the format you're using for protein names, descriptions, and sequences.

I'm going to guess you're using uniprot_trembl_fungi.dat.gz from ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/ but please confirm.

alchemistmatt commented 7 years ago

If you're using the full-size TREMBL nr_fungal FASTA file, I'm frankly surprised that MSGF+ is not running out of memory. We have found that when FASTA files get larger than ~800 MB, we get memory usage issues (in that the system requires 16 GB of memory or more, scaling with FASTA file size). In cases like that we split the FASTA file into multiple parts, run MSGF+ once on each FASTA file part, then merge the results together.

The May 2017 release of uniprot_trembl_fungi.dat has 6.6 million proteins, giving a 4 GB FASTA file. The decoy version of that is 8 GB. I see you're allocating 50 GB via /java -Xmx50g so hopefully that's enough memory, but I suggest you first get things working with a sampling of that huge FASTA file. Something like head -5000000 uniprot_trembl_fungi.fasta > uniprot_trembl_fungi_excerpt.fasta

Stortebecker commented 6 years ago

I have got a similar error when running an mzML file, which has undergone PeakPicking on MS2 level with the OpenMS tool PeakPickerHiRes. When I instead use the vendor peak picking provided by MSConvert, MSGF runs without any error.

You can find the database, the original file and the vendor-peak-picked file here. I uploaded the PeakPickerHiRes output to Dropbox.

The command I ran: java -jar MSGFPlus.jar -s PeakPickerHiRes_on_qExactive01819.mzml -d Human_database_cRAP_added.fasta -t 10ppm

The error I got:

Loading database finished (elapsed time: 20,16 sec) Reading spectra... java.lang.NullPointerException at edu.ucsd.msjava.msutil.Spectrum.getCharge(Spectrum.java:124) at edu.ucsd.msjava.msutil.SpecKey.getSpecKeyList(SpecKey.java:91) at edu.ucsd.msjava.ui.MSGFPlus.runMSGFPlus(MSGFPlus.java:220) at edu.ucsd.msjava.ui.MSGFPlus.runMSGFPlus(MSGFPlus.java:105) at edu.ucsd.msjava.ui.MSGFPlus.main(MSGFPlus.java:56)

hroest commented 6 years ago

@Stortebecker maybe this is related to https://github.com/OpenMS/OpenMS/pull/3082

@alchemistmatt is it possible that MSGF+ relies on optional elements in the mzML file?

FarmGeek4Life commented 5 years ago

@Stortebecker That file has no charge state information for the precursors, which is what MS-GF+ is trying to read when it crashes. PeakPickerHiRes does not report the charge states, but as of 2014 there was work in progress to implement charge state determination/deconvolution algorithms as options in OpenMS, according to OpenMS issue #877.

@RiegardtJohnson: This is a problem with the implementation of the search in MS-GF+, and limitations of Java. Java uses a 32-bit integer as the index for an array, which limits values to ~2.147 billion entries; MS-GF+ accesses all peptides in the fasta file in a way that means each residue is one entry in an array. Your database file, at 4GB, is big enough to have this problem for just a target or decoy search; when creating the concatenated target/decoy files for a target and decoy combined search, the number of residues is doubled, which doesn't make it any easier.

Wang-kaifei commented 7 months ago

Dear Developers,

I had the same problem recently. I was using a fasta file size of 14GB, and by reading the replies between everyone, I realised that I needed to slice the database for searching.

Because there are cases where a single MSMS is matched to different peptides in different searches, it seems to me that it is not possible to directly concatenate the results of these searches.

So I wonder if there is an official tool for merging the results from these sliced searches?

The command I use is: java -Xms150G -Xmx210G -jar MSGFPlus.jar -conf param_file_path I am using the software version: MSGFPlus_v20230112

Any replies will be appreciated!

alchemistmatt commented 7 months ago

Use the MzidMerger to combine .mzid files from separate MS-GF+ searches of the same instrument file

Wang-kaifei commented 7 months ago

Use the MzidMerger to combine .mzid files from separate MS-GF+ searches of the same instrument file

Thanks a lot, I will try it!

Wang-kaifei commented 7 months ago

Dear,

I've got another problem. When I use the command: dotnet /data/liuqingxiu/wkf/MSGFMerge/net5.0/MzidMerger.exe -inDir a -out b, I receive the following error:

Error: An assembly specified in the application dependencies manifest (MzidMerger.deps.json) has already been found but with a different file extension: package: 'MzidMerger', version: '1.3.1' path: 'MzidMerger.dll' previously found assembly: '/data/liuqingxiu/wkf/MSGFMerge/net5.0/MzidMerger.exe'

I'm using Ubuntu 20.04 with dotnet version 5.0.408.

Any replies will be appreciated!

FarmGeek4Life commented 7 months ago

I think you need to use "dotnet run /data/liuqingxiu/wkf/MSGFMerge/net5.0/MzidMerger.dll -inDir a -outDir b". I can't say with certainty, but I know the .exe is designed to be run standalone, so it's probably not the correct file to specify there, and all of the online examples show the use of a .dll.


From: Kaifei Wang @.> Sent: Friday, December 22, 2023 12:13:34 AM To: MSGFPlus/msgfplus @.> Cc: Gibbons, Bryson C @.>; Comment @.> Subject: Re: [MSGFPlus/msgfplus] java.lang.NullPointerException when running MS-GF+ (#13)

Check twice before you click! This email originated from outside PNNL.

Dear,

I've got another problem. When I use the command: dotnet /data/liuqingxiu/wkf/MSGFMerge/net5.0/MzidMerger.exe -inDir a -out b, I receive the following error:

Error: An assembly specified in the application dependencies manifest (MzidMerger.deps.json) has already been found but with a different file extension: package: 'MzidMerger', version: '1.3.1' path: 'MzidMerger.dll' previously found assembly: '/data/liuqingxiu/wkf/MSGFMerge/net5.0/MzidMerger.exe'

I'm using Ubuntu 20.04 with dotnet version 5.0.408.

Any replies will be appreciated!

— Reply to this email directly, view it on GitHubhttps://github.com/MSGFPlus/msgfplus/issues/13#issuecomment-1867372611, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABPPX5N6JW3IGFVMXO3U3A3YKU6K5AVCNFSM4DOJPKS2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBWG4ZTOMRWGEYQ. You are receiving this because you commented.Message ID: @.***>