bzhanglab / DeepRescore

DeepRescore: rescore PSMs leveraging deep learning-derived peptide features
5 stars 3 forks source link

Spectrum in mgf file not found #2

Open nukaemon opened 3 years ago

nukaemon commented 3 years ago

Dear developers

Thank you for providing this useful tool. I setup DeepRescore in AWS(CentOS7) with GPU backend and it could run on test data successfully.

Now, I tried on my own data but encountered with an error at 'process_pDeep2_results' step. The nextflow command executed is below which should be ok, and the identification file (output.2021_04_07_02_59_45.t.xml) was generated from the same mgf file(myown.mgf) using X!Tandem.

nextflow run ${DEEPRESCORE} --id_file output.2021_04_07_02_59_45.t.xml --ms_file myown.mgf --se xtandem --ms_instrument Lumos --ms_energy 0.34 --prefix d2 --decoy_prefix XXX_  --cpu 4 --mem 12

The command error message from nextflow tells that something wrong happened in Java execution

Command error:
  Exception in thread "main" java.io.IOException: Spectrum 'File: D:\Discoverer2_2Data\DiscovererDaemon\200611BPB\F24Z019E_1_5ul.raw; SpectrumID: 2220; scans: 2975' in mgf file 'myown.mgf' not found!
        at com.compomics.util.experiment.massspectrometry.SpectrumFactory.getSpectrum(SpectrumFactory.java:788)
        at com.compomics.util.experiment.massspectrometry.SpectrumFactory.getSpectrum(SpectrumFactory.java:730)
        at PDVGUI.GenerateSpectrumTable.process(GenerateSpectrumTable.java:84)
        at PDVGUI.GenerateSpectrumTable.<init>(GenerateSpectrumTable.java:31)
        at PDVGUI.GenerateSpectrumTable.main(GenerateSpectrumTable.java:21)

I looked at myown.mgf and there are actually entry lines related with SpectrumID: 2220; scans: 2975.

.
.
.
BEGIN IONS
TITLE=File: "D:\Discoverer2_2Data\DiscovererDaemon\200611BPB\F24Z019E_1_5ul.raw"; SpectrumID: "2220"; scans: "2975"
PEPMASS=496.76181 16035.10645
CHARGE=2+
RTINSECONDS=724
SCANS=2975
168.998 9.06042
171.242 47.312
176.236 15.0943
183.230 16.0298
186.481 12.8785
.
.
.

Also, SpectrumID "2220" shows up at the first line in 'd2_format_titles.txt', so it seems getting the error immediately on loading 'd2_format_titles.txt'.

I also manually tested each command executed in process_pDeep2_results step and confirmed that 'd2_spectrum_pairs.txt' was generated but empty after PDV-1.6.1.beta.features-jar-with-dependencies.jar.

Do you have any idea to solve this problem? I paste below the whole log message from nextflow just in case.

log message from nextflow ``` [37/ea937f] process > xml2mzid (d2) [100%] 1 of 1 ✔ [1a/aa62b3] process > calc_basic_features_xt (d2) [100%] 1 of 1 ✔ [89/0c6d10] process > pga_fdr_control (d2) [100%] 1 of 1 ✔ [1d/7ff0ff] process > generate_train_prediction_data (d2) [100%] 1 of 1 ✔ [29/3b4f5d] process > run_pdeep2 (d2) [100%] 1 of 1 ✔ [f0/b0788f] process > process_pDeep2_results (d2) [100%] 1 of 1, failed: 1 ✘ [- ] process > train_autoRT - [- ] process > predicte_autoRT - [- ] process > generate_percolator_input - [- ] process > run_percolator - [- ] process > generate_pdv_input - Error executing process > 'process_pDeep2_results (d2)' Caused by: Process `process_pDeep2_results (d2)` terminated with an error exit status (2) Command executed: #!/bin/sh mv d2_pdeep2_prediction_results.txt d2_pdeep2_prediction_results.txt.mgf Rscript /home/centos/DeepRescore/bin/format_pDeep2_titile.R d2_pdeep2_prediction.txt d2-rawPSMs.txt ./d2_format_titles.txt java -Xmx12g -cp /home/centos/DeepRescore/bin/PDV-1.6.1.beta.features/PDV-1.6.1.beta.features-jar-with-dependencies.jar PDVGUI.GenerateSpectrumTable ./d2_format_titles.txt myown.mgf d2_pdeep2_prediction_results.txt.mgf ./d2_spectrum_pairs.txt xtandem mkdir sections sections_results Rscript /home/centos/DeepRescore/bin/similarity/devide_file.R ./d2_spectrum_pairs.txt 4 ./sections/ for file in ./sections/* do name=`basename $file` Rscript /home/centos/DeepRescore/bin/similarity/calculate_similarity_SA.R $file ./sections_results/${name}_results.txt & done wait awk 'NR==1 {header=$_} FNR==1 && NR!=1 { $_ ~ $header getline; } {print}' ./sections_results/*_results.txt > ./d2_similarity_SA.txt Command exit status: 2 Command output: (empty) Command error: Exception in thread "main" java.io.IOException: Spectrum 'File: D:\Discoverer2_2Data\DiscovererDaemon\200611BPB\F24Z019E_1_5ul.raw; SpectrumID: 2220; scans: 2975' in mgf file 'myown.mgf' not found! at com.compomics.util.experiment.massspectrometry.SpectrumFactory.getSpectrum(SpectrumFactory.java:788) at com.compomics.util.experiment.massspectrometry.SpectrumFactory.getSpectrum(SpectrumFactory.java:730) at PDVGUI.GenerateSpectrumTable.process(GenerateSpectrumTable.java:84) at PDVGUI.GenerateSpectrumTable.(GenerateSpectrumTable.java:31) at PDVGUI.GenerateSpectrumTable.main(GenerateSpectrumTable.java:21) Bioconductor version 3.10 (BiocManager 1.30.10), ?BiocManager::install for help Bioconductor version '3.10' is out-of-date; the current release version '3.12' is available with R version '4.0'; see https://bioconductor.org/install ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ── ✔ ggplot2 3.2.1 ✔ purrr 0.3.3 ✔ tibble 2.1.3 ✔ dplyr 0.8.4 ✔ tidyr 1.0.0 ✔ stringr 1.4.0 ✔ readr 1.3.1 ✔ forcats 0.4.0 ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag() Attaching package: ‘data.table’ The following objects are masked from ‘package:dplyr’: between, first, last The following object is masked from ‘package:purrr’: transpose Warning message: In fread(args[1]) : File './d2_spectrum_pairs.txt' has size 0. Returning a NULL data.table. Bioconductor version 3.10 (BiocManager 1.30.10), ?BiocManager::install for help Bioconductor version '3.10' is out-of-date; the current release version '3.12' is available with R version '4.0'; see https://bioconductor.org/install ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ── ✔ ggplot2 3.2.1 ✔ purrr 0.3.3 ✔ tibble 2.1.3 ✔ dplyr 0.8.4 ✔ tidyr 1.0.0 ✔ stringr 1.4.0 ✔ readr 1.3.1 ✔ forcats 0.4.0 ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::between() masks data.table::between() ✖ dplyr::filter() masks stats::filter() ✖ dplyr::first() masks data.table::first() ✖ dplyr::lag() masks stats::lag() ✖ dplyr::last() masks data.table::last() ✖ purrr::transpose() masks data.table::transpose() Error in fread(args[1]) : File './sections/*' does not exist or is non-readable. getwd()=='/home/centos/Work/test/work/f0/b0788f2a27d40b620301bb2776920b' Execution halted awk: cannot open ./sections_results/*_results.txt (No such file or directory) Work dir: /home/centos/Work/test/work/f0/b0788f2a27d40b620301bb2776920b Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line ```
KaiLiCn commented 3 years ago

Hi,

Sorry for the inconvenience. Could you please share all inputs of process_pDeep2_results with me? I will test and fix it this week.

Kai

nukaemon commented 3 years ago

Hello @KaiLiCn

Thank you for your kind reply and support. I can share the input files via AWS by providing you S3 Presinged URL. Can I send it to your gmail adrress in your profile?

KaiLiCn commented 3 years ago

Sure. My email address is likaicnsh@gmail.com.

wenbostar commented 3 years ago

@KaiLiCn any updates on this?