chhh / MSFragger-GUI

The project has migrated to https://github.com/Nesvilab/FragPipe
GNU General Public License v3.0
0 stars 1 forks source link

mzML search crashes with IO error #1

Closed stharan closed 6 years ago

stharan commented 6 years ago

Describe the problem

System info

You can find that printed on the Config tab.


Describe your experiment

Hek 293 cell lysate run with 90min gradient on timsTOF Pro in PASEF mode

Genral proteomics experiment description

e.g. "TMT, Human, full cell lysate with Trypsin" , "AP-MS pulldowns, mouse, liver tissue"

...

Input data files

single mzML file 192Gb large

Sequence database

uniprot human database downloaded with philosopher


Attach fragger.params file

num_threads = 0 # 0=poll CPU to set num threads; else specify num threads directly (max 64) precursor_mass_tolerance = 500.00 precursor_mass_lower = -100 # Overrides the lower bound of the window set by precursor_mass_tolerance precursor_mass_upper = 100 # Overrides the upper bound of the window set by precursor_mass_tolerance precursor_mass_units = 1 # 0=Daltons, 1=ppm precursor_true_tolerance = 20 precursor_true_units = 1 # 0=Daltons, 1=ppm fragment_mass_tolerance = 20 fragment_mass_units = 1 # 0=Daltons, 1=ppm isotope_error = 0/1/2 # 0=off, -1/0/1/2/3 (standard C13 error) mass_offsets = 0 # allow for additional precursor mass window shifts. Multiplexed with isotope_error. mass_offsets = 0/79.966 can be used as a restricted ‘open’ search that looks for unmodified and phosphorylated peptides (on any residue) search_enzyme_name = Trypsin search_enzyme_cutafter = KR search_enzyme_butnotafter = P num_enzyme_termini = 2 # 2 for enzymatic, 1 for semi-enzymatic, 0 for nonspecific digestion allowed_missed_cleavage = 2 # maximum value is 5 clip_nTerm_M = 1 variable_mod_01 = 15.99490 M variable_mod_02 = 42.01060 [^

variable_mod_03 = 79.96633 STY

variable_mod_04 = -17.02650 nQnC variable_mod_05 = -18.01060 nE allow_multiple_variable_mods_on_residue = 1 # static mods are not considered max_variable_mods_per_mod = 3 # maximum of 5 max_variable_mods_combinations = 5000 # maximum of 65534, limits number of modified peptides generated from sequence output_file_extension = pepXML output_format = pepXML output_report_topN = 1 output_max_expect = 50.0 precursor_charge = 0 0 # precursor charge range to analyze; does not override any existing charge; 0 as 1st entry ignores parameter override_charge = 0 # 0=no, 1=yes to override existing precursor charge states with precursor_charge parameter digest_min_length = 7 digest_max_length = 50 digest_mass_range = 500.0 5000.0 # MH+ peptide mass range to analyze max_fragment_charge = 2 # set maximum fragment charge state to analyze (allowed max 5) track_zero_topN = 0 # in addition to topN results, keep track of top results in zero bin zero_bin_accept_expect = 0 # boost top zero bin entry to top if it has expect under 0.01 - set to 0 to disable zero_bin_mult_expect = 1 # disabled if above passes - multiply expect of zero bin for ordering purposes (does not affect reported expect) add_topN_complementary = 0 minimum_peaks = 15 # required minimum number of peaks in spectrum to search (default 10) use_topN_peaks = 100 min_fragments_modelling = 3 min_matched_fragments = 4 minimum_ratio = 0.01 # filter peaks below this fraction of strongest peak clear_mz_range = 0.0 0.0 # for iTRAQ/TMT type data; will clear out all peaks in the specified m/z range add_Cterm_peptide = 0.000000 add_Nterm_peptide = 0.000000 add_Cterm_protein = 0.000000 add_Nterm_protein = 0.000000 add_G_glycine = 0.000000 add_A_alanine = 0.000000 add_S_serine = 0.000000 add_P_proline = 0.000000 add_V_valine = 0.000000 add_T_threonine = 0.000000 add_C_cysteine = 57.021464 add_L_leucine = 0.000000 add_I_isoleucine = 0.000000 add_N_asparagine = 0.000000 add_D_aspartic_acid = 0.000000 add_Q_glutamine = 0.000000 add_K_lysine = 0.000000 add_E_glutamic_acid = 0.000000 add_M_methionine = 0.000000 add_H_histidine = 0.000000 add_F_phenylalanine = 0.000000 add_R_arginine = 0.000000 add_Y_tyrosine = 0.000000 add_W_tryptophan = 0.000000 add_B_user_amino_acid = 0.000000 add_J_user_amino_acid = 0.000000 add_O_user_amino_acid = 0.000000 add_U_user_amino_acid = 0.000000 add_X_user_amino_acid = 0.000000 add_Z_user_amino_acid = 0.000000 database_name = C:\MSfragger\2018-06-22-td-UP000005640.fas

Run log output

System info: System OS: Windows 10, Architecture: AMD64 Java Info: 1.8.0_161, Java HotSpot(TM) 64-Bit Server VM, Oracle Corporation

Version info: MSFragger-GUI version 6.0 MSFragger version 20171106 Philosopher version 20180530 (build 201805301641)

Will execute 9 commands: java -jar C:\MSfragger\MSFragger-20171106\MSFragger-20171106.jar D:\MSfragger_test\closedSearch\fragger.params D:\MSfragger_test\HEK_200ng_90minPU3_default_Slot1-13_01_675.mzML

java -cp C:\MSfragger\MSFragger-GUI.exe umich.msfragger.util.FileMove D:\MSfragger_test\HEK_200ng_90minPU3_default_Slot1-13_01_675.pepXML D:\MSfragger_test\closedSearch\HEK_200ng_90minPU3_default_Slot1-13_01_675.pepXML

C:\MSfragger\philosopher_windows_amd64.exe workspace --clean

C:\MSfragger\philosopher_windows_amd64.exe workspace --init

C:\MSfragger\philosopher_windowsamd64.exe peptideprophet --decoy rev --decoyprobs --ppm --accmass --nonparam --expectscore --database C:\MSfragger\2018-06-22-td-UP000005640.fas D:\MSfragger_test\closedSearch\HEK_200ng_90minPU3_default_Slot1-13_01_675.pepXML

C:\MSfragger\philosopher_windows_amd64.exe proteinprophet --output interact interact-HEK_200ng_90minPU3_default_Slot1-13_01_675.pep.xml

C:\MSfragger\philosopher_windowsamd64.exe database --annotate C:\MSfragger\2018-06-22-td-UP000005640.fas --prefix rev

C:\MSfragger\philosopher_windowsamd64.exe filter --sequential --tag rev --pepxml D:\MSfragger_test\closedSearch --protxml D:\MSfragger_test\closedSearch\interact.prot.xml

C:\MSfragger\philosopher_windows_amd64.exe report



Executing command:
$> java -jar C:\MSfragger\MSFragger-20171106\MSFragger-20171106.jar D:\MSfragger_test\closedSearch\fragger.params D:\MSfragger_test\HEK_200ng_90minPU3_default_Slot1-13_01_675.mzML 
Process started
MSFragger version MSFragger-20171106
(c) University of Michigan

Unknown parmameters:
    mass_offsets = 0

Peptide index read in 359ms
Selected fragment tolerance 0.10 Da and maximum fragment slice size of 20929.36MB

872784940 fragments to be searched in 1 slices (6.50GB total)
Operating on slice 1 of 1: 
12126ms
    HEK_200ng_90minPU3_default_Slot1-13_01_675.mzML 
java.lang.NegativeArraySizeException
    at umich.ms.fileio.filetypes.mzml.MZMLIndexParser.parse(MZMLIndexParser.java:97)
    at umich.ms.fileio.filetypes.mzml.MZMLFile.parseIndex(MZMLFile.java:65)
    at umich.ms.fileio.filetypes.mzml.MZMLFile.fetchIndex(MZMLFile.java:54)
    at umich.ms.fileio.filetypes.mzml.MZMLFile.fetchIndex(MZMLFile.java:31)
    at umich.ms.fileio.filetypes.xmlbased.AbstractXMLBasedDataSource.parse(AbstractXMLBasedDataSource.java:114)
    at umich.ms.datatypes.scancollection.impl.ScanCollectionDefault.loadData(ScanCollectionDefault.java:763)
    at umich.ms.datatypes.scancollection.impl.ScanCollectionDefault.loadData(ScanCollectionDefault.java:747)
    at o.b(Unknown Source)
    at n.d(Unknown Source)
    at n.a(Unknown Source)
    at MSFragger.main(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)

Error parsing: HEK_200ng_90minPU3_default_Slot1-13_01_675.mzML3502ms

Process finished, exit value: 0
Executing command:
$> java -cp C:\MSfragger\MSFragger-GUI.exe umich.msfragger.util.FileMove D:\MSfragger_test\HEK_200ng_90minPU3_default_Slot1-13_01_675.pepXML D:\MSfragger_test\closedSearch\HEK_200ng_90minPU3_default_Slot1-13_01_675.pepXML 
Process started
File does not exist: D:\MSfragger_test\HEK_200ng_90minPU3_default_Slot1-13_01_675.pepXML
Process finished, exit value: 1
Previous process returned exit code [1], cancelling further processing..
Cancelled execution of: 
C:\MSfragger\philosopher_windows_amd64.exe workspace --clean 
Cancelled execution of: 
C:\MSfragger\philosopher_windows_amd64.exe workspace --init 
Cancelled execution of: 
C:\MSfragger\philosopher_windows_amd64.exe peptideprophet --decoy rev_ --decoyprobs --ppm --accmass --nonparam --expectscore --database C:\MSfragger\2018-06-22-td-UP000005640.fas D:\MSfragger_test\closedSearch\HEK_200ng_90minPU3_default_Slot1-13_01_675.pepXML 
Cancelled execution of: 
C:\MSfragger\philosopher_windows_amd64.exe proteinprophet --output interact interact-HEK_200ng_90minPU3_default_Slot1-13_01_675.pep.xml 
Cancelled execution of: 
C:\MSfragger\philosopher_windows_amd64.exe database --annotate C:\MSfragger\2018-06-22-td-UP000005640.fas --prefix rev_ 
Cancelled execution of: 
C:\MSfragger\philosopher_windows_amd64.exe filter --sequential --tag rev_ --pepxml D:\MSfragger_test\closedSearch --protxml D:\MSfragger_test\closedSearch\interact.prot.xml 
Cancelled execution of: 
C:\MSfragger\philosopher_windows_amd64.exe report 
=========================
===
===        Done
===
=========================
chhh commented 6 years ago

1) timsTOF has ion mobility, I'm not exactly sure how such data gets written to mzML format, but this most likely will not be supported, data is expected to contain standard MS2 scans with precursor isolation windows written in the file. 2) As files are slurped in memory for speed it won't be possible to search 192Gb files with the current implementation anyway.