fgcz / rawDiag

Brings Orbitrap mass spectrometry data to life; multi-platform, fast and colorful R package
https://bioconductor.org/packages/rawDiag
36 stars 11 forks source link

Error in source(tfo) : negative length vectors are not allowed #67

Closed rcastelo closed 3 years ago

rcastelo commented 3 years ago

hi,

thank you very much for putting together this package to enable reading propriatary Oribtrap RAW files into R. i'm trying to read and process the RAW files at

https://www.ebi.ac.uk/pride/archive/projects/PXD011626

using the rawDiag package but following the vignette, I'm encountering an error at somepoint. To reproduce it, you can download the following file:

ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2019/06/PXD011626/180506_S_ROCA_01_01_BS03.raw

as follows:

library(rawDiag)
rawfile <- "180506_S_ROCA_01_01_BS03.raw"
info <- read.raw.info(rawfile)
RAW <- read.raw(rawfile)
executing mono /Users/robert/Library/R/4.0/library/rawDiag/exec/fgcz_raw.exe 180506_S_ROCA_01_01_BS03.raw qc ...
NA values replaced in MasterScanNumber 
renamed LMCorrectionppm to LMCorrection
copied AGC to AGCMode
calculated PrescanMode values
Warning messages:
1: In is.rawDiag(object) :
  missing column name(s): FTResolution, LMCorrection, transient, AGCMode, PrescanMode
2: `funs()` is deprecated as of dplyr 0.8.0.
Please use a list of either functions or lambdas: 

  # Simple named list: 
  list(mean = mean, median = median)

  # Auto named with `tibble::lst()`: 
  tibble::lst(mean, median)

  # Using lambdas
  list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 

note the warning about the call to the deprecated funs() function, which you may want to fix, but the real problem comes when trying to use the function readScans():

scans <- readScans(rawfile, info$`Scan range`[1]:info$`Scan range`[2])
Error in source(tfo) : negative length vectors are not allowed
traceback()
2: source(tfo)
1: readScans(rawfile, info$`Scan range`[1]:info$`Scan range`[2])

the traceback doesn't seem to give a lot of information. i'm pasting below my session information. let me know if i can further help in diagnosing the problem. in principle, i have a proper and up to date installation of the Mono .NET framework for macOS.

thanks!! robert. ps: in EuroBioC2019 in Brussels i met Christian @cpanse and i learned from him about your package to read Orbitrap RAW files.

sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rawDiag_0.0.38 colorout_1.2-2

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.0  purrr_0.3.4       haven_2.3.1       lattice_0.20-41  
 [5] colorspace_2.0-0  vctrs_0.3.6       generics_0.1.0    yaml_2.2.1       
 [9] blob_1.2.1        rlang_0.4.10      hexbin_1.28.2     pillar_1.4.7     
[13] glue_1.4.2        DBI_1.1.1         tidyverse_1.3.0   bit64_4.0.5      
[17] dbplyr_2.1.0      modelr_0.1.8      readxl_1.3.1      protViz_0.6.8    
[21] lifecycle_0.2.0   stringr_1.4.0     munsell_0.5.0     gtable_0.3.0     
[25] cellranger_1.1.0  rvest_0.3.6       codetools_0.2-18  memoise_2.0.0    
[29] forcats_0.5.1     fastmap_1.1.0     parallel_4.0.3    broom_0.7.4      
[33] Rcpp_1.0.6        readr_1.4.0       backports_1.2.1   scales_1.1.1     
[37] cachem_1.0.3      jsonlite_1.7.2    fs_1.5.0          bit_4.0.4        
[41] ggplot2_3.3.3     hms_1.0.0         stringi_1.5.3     dplyr_1.0.4      
[45] grid_4.0.3        tools_4.0.3       magrittr_2.0.1    tibble_3.0.6     
[49] RSQLite_2.2.3     crayon_1.4.0      tidyr_1.1.2       pkgconfig_2.0.3  
[53] ellipsis_0.3.1    xml2_1.3.2        reprex_1.0.0      lubridate_1.7.9.2
[57] rstudioapi_0.13   assertthat_0.2.1  httr_1.4.2        R6_2.5.0         
[61] compiler_4.0.3   
cpanse commented 3 years ago

Dear Robert,

Thank you for your e-mail.

the file is too big. just split the problem into smaller scan packages (no more 1000 scans),

e.g, by using mclapply.

usually, it does not make sense to read 30K scans in one piece.

scans <- readScans(rawfile, info$`Scan range`[1]:10)

I hope that helps.

Best wishes,

Christian

PS: we have developed a more robust | better designed | more flexible package for reading Orbitrap data into R (see also https://github.com/fgcz/rawrr/releases) and we are going to replace rawDiag's reader function with rawrr's reader functions.

rcastelo commented 3 years ago

Dear Christian,

thanks for your prompt response, indeed reading fewer scans at once works fine. With respect to the new package 'rawrr', do you recommend me to switch to this new package or do you mean that rawDiag will be calling functions from rawrr?

One further question, let's say i want to check for the presence of the peptide

AAVGQEEIQLR

in this Orbitrap data, do you have published any workflow with your package that shows me how to check that?

thanks!

robert.

cpanse commented 3 years ago

Dear Robert,

1. If you want to access Orbitrap data, e.g., spectra and chromatograms, switch to rawrr. If you are interested in the diagnostic plots, use rawDiag.

2. Given your raw file, there seems to be a match for scan 9340.

#R
# requires  https://CRAN.R-project.org/package=protViz
# https://github.com/fgcz/rawrr/releases

rawfile <- file.path(Sys.getenv('HOME'), "Downloads", "180506_S_ROCA_01_01_BS03.raw")
Idx <- rawrr::readIndex(rawfile = rawfile)

plot(Idx$precursorMass ~ Idx$rtinseconds, pch=16, col='#55555555', main="LCMS map")

peptide <- "AAVGQEEIQLR"
mass2Hplus <- (protViz::parentIonMass(peptide) + 1.008) / 2

S <- rawrr::readSpectrum(rawfile, hit <- which(abs(Idx$precursorMass - mass2Hplus) < 0.01))
points(Idx$rtinseconds[hit], Idx$precursorMass[hit], col='red')

bIons <- protViz::fragmentIon(peptide)[[1]]$b
yIons <- protViz::fragmentIon(peptide)[[1]]$y

pdf("/tmp/out.pdf", 19,12); 
lapply(S, function(x){
        plot(x, SN = TRUE, diagnostic = TRUE);
        abline(v=bIons, col='#FF555555', lwd=5);
        abline(v=yIons, col='#55FF5555', lwd=5)
}); 
dev.off()

The code snippet above is not meant to be a real search engine as, e.g., http://comet-ms.sourceforge.net/ is. It is just a visual check ... very heuristic.

A better example is shown in the manuscript: http://fgcz-ms.uzh.ch/~cpanse/manuscripts/rawrr.pdf

Cheers,

Christian

cpanse commented 3 years ago

@ctrachse Can you help me to fix these dplyr issues?

rcastelo commented 3 years ago

Thanks Christian, this is great, is there a package for R that would do the job of a real search engine? could you recommend me any reading or course material for learning how to interpret the "relative intensity by m/z" plot to reach the conclusion that a given peptide is there?

cpanse commented 3 years ago

Robert;

There is no R package I have tested and can recommend. I use comet since the beginning. It is well-engineered, robust, easy to configure, and fun to use from the command-line or automatic workflows, easy to compile (C++) on all major platforms. It should also be no big deal to build an R wrapper and trigger a system2 call or make a lib call using Rcpp.

Since having two mass spec geeks watching my text(@ctrachse @tobiasko can you please help?), I won't be foolish and recommend you a tutorial on peptide spectrum matching.

I can only tell you it is like making a sailing knot; a good peptide spectrum match has to appeal. E.g., scan 9340 looks like noise, while the example in the manuscript is a perfect match. All high peaks are assigned. Greetings from Zurich, Christian

tobiasko commented 3 years ago

Hi @rcastelo,

you might have a look at the MSGFplus R package. It provides an R interface to the popular MS-GF+ search engine. The engine is written in JAVA and should therefore run on all popular OS. What I don't like is that you need to convert raw data to PSI exchange formats (no direct access).

Regarding the interpretation of XICs. On its own they should NEVER be used to make peptide inference (even not for HR-AM data), unless you have a very simple matrix and are in control of sample composition (QC samples).

Best, Tobi

tobiasko commented 3 years ago

Neonatal dried blood spots are for sure not the above described typ of sample.

tobiasko commented 3 years ago

Your data is of the type explained in https://portlandpress.com/biochemist/article/42/5/64/226371/A-beginner-s-guide-to-mass-spectrometry-based, Fig. 2A

rcastelo commented 3 years ago

Dear Christian, dear Tobias,

Thank you very much for your useful comments. I'm not surprised that you say that scan 9340 looks like noise since, as Tobias looked up, this is data acquired from a neonatal dried blood spot, which actually was stored at room temperature for over 5 years and I guess that makes it a difficult sample. Our proteomics core ran the MS over 20 such samples and did the quantification with MaxQuant. From that quantification the above peptide was detected constitutively across the 20 samples. We published already the results (link) but I'd like to revisit more in depth the proteomics data to see if we can detect further peptides encoded by unannotated transcripts that we have assembled from the corresponding RNA-seq data. For that reason, I'd like to have a closer integration of our R-based pipeline with the proteomics search. I've looked at the MSFGPlus package and seems to provide the functionality I was looking for but, as you say, it requires transforming the RAW files into PSI format. You say you don't like not having direct access to the RAW data, is there any information loss in the conversion from RAW to PSI?

In the rawR manuscript you mention that the ProteoWizard tool can do the conversion. Googling a bit i found that the ThermoRawFileParser tool also allows one to do that conversion. Do you recommend any particular tool for converting?

Do you plan to write an R package such as MSFGPlus but that would work directly with RAW files? (I think this would be a great addition to the R/Bioconductor ecosystem!)

cpanse commented 3 years ago

Dear Robert,

1. In general, I would say no critical information is lost when converting to the HUPO-PSI format. However, it is always possible to generate a case where information is lost.

In the beginning, we processed Terrabytes of MS data by generating mzXML -> Sequest -> ISB TPP -> XML -> HTML. E.g., https://doi.org/10.1038/nbt1300 https://doi.org/10.1371/journal.pbio.1000048

This pipeline got more and more unimportant with the availability of tools also performing (label free) quantification, e.g., Mascot Distiller, MaxQuant, FraqPipe, Skyline, Spectronaut, Proteome Discover ...

All these tools operate (except fraqpipe) on the vendor-proprietary data format only. And they will have their reasons.

As a core facility, we have to keep track of all the data. Keeping an entire redundant file is an effort (generating, in the past, it ran only on Microsoft's OS; linking; archiving or keeping a VM with HUPO-PSI converter version x.y.z up and running to be reproducible).

2. We like and watch the ThermoRawFileParser project but have to stay more conservative and use ProteoWizard to generate HUPO-PSI files if necessary.

C

tobiasko commented 3 years ago

Dear Robert,

I think it would indeed be nice if one could run DB searches or spectral library searches from the R command line just using the raw data as input. But I would def. take an existing engine and only build an R interface around it. There is already so many good engines (code) out there, why reinventing the wheel? I guess the problem is that Thermo Fisher Scientific "only" offers a .NET library for raw data access. So your engine also needs to be written in C#, otherwise you have to bridge to JAVA or C++, ... MaxQuant is written in C#, as Skyline, proteome discoverer and Spectronaut. So these all read the data from the vendor formatted files. MSfragger (JAVA) converts to something they call mzBIN files internally using the .NET library in a preprocessing step.

For your application the information loss is neglectable, but officially the http://proteowizard.sourceforge.net/index.html project has the mandate from https://www.hupo.org/Proteomics-Standards-Initiative.

Hope this helps, Tobi

tobiasko commented 3 years ago

I just tried the latest ThermoRawFileParser release on my Mac:

tobiasko@fgcz-m-245 ThermoRawFileParser % mono ThermoRawFileParser.exe -i=/Users/tobiasko/Downloads/20181113_010_autoQC01.raw     
2021-02-12 10:50:19 INFO Started parsing /Users/tobiasko/Downloads/20181113_010_autoQC01.raw
2021-02-12 10:50:20 INFO Processing 21881 MS scans
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 

2021-02-12 10:50:38 INFO Finished parsing /Users/tobiasko/Downloads/20181113_010_autoQC01.raw
tobiasko@fgcz-m-245 ThermoRawFileParser % 

Conversion works without an issue and it should be very easy to write a corresponding system2() call in R. Actually pretty cool!

tobiasko commented 3 years ago

@rcastelo below is a MSGF+ run on my system:

> res <- runMSGF(par, "/Users/tobiasko/Downloads/20181113_010_autoQC01.mzML")
First time using MSGFplus: Downloading MS-GF+ code
trying URL 'http://proteomics.ucsd.edu/Software/MSGFPlus/MSGFPlus.20140630.zip'
Content type 'application/zip' length 20246296 bytes (19.3 MB)
==================================================
downloaded 19.3 MB

'/usr/bin/java' -Xmx10000M -jar '/Library/Frameworks/R.framework/Versions/4.0/Resources/library/MSGFplus/MSGFPlus/MSGFPlus.jar' -s '/Users/tobiasko/Downloads/20181113_010_autoQC01.mzML' -o '/Users/tobiasko/Downloads/20181113_010_autoQC01.mzid' -d '/Users/tobiasko/Downloads/uniprot-proteome_UP000005640.fasta' -t 10ppm -tda 1 -inst 3 -e 1 -ntt 1 -mod '/Library/Frameworks/R.framework/Versions/4.0/Resources/library/MSGFplus/modification_temp.txt' -minLength 6 -maxLength 25 -minCharge 2 -maxCharge 6 -n 2 

MS-GF+ Beta (v10072) (6/30/2014)
Loading database files...
Creating /Users/tobiasko/Downloads/uniprot-proteome_UP000005640.revCat.fasta.
Creating the suffix array indexed file... Size: 52458889
AlphabetSize: 28
Suffix creation: 0.00% complete.
Suffix creation: 19.06% complete.
Suffix creation: 38.13% complete.
Suffix creation: 57.19% complete.
Suffix creation: 76.25% complete.
Suffix creation: 95.31% complete.
Sorting 0.00% complete.
Sorting 5.81% complete.
Sorting 11.62% complete.
Sorting 17.43% complete.
Sorting 23.24% complete.
Sorting 29.05% complete.
Sorting 34.86% complete.
Sorting 40.67% complete.
Sorting 46.48% complete.
Sorting 52.29% complete.
Sorting 58.10% complete.
Sorting 63.91% complete.
Sorting 69.73% complete.
Sorting 75.54% complete.
Sorting 81.35% complete.
Sorting 87.16% complete.
Sorting 92.97% complete.
Sorting 98.78% complete.
Loading database finished (elapsed time: 40.28 sec)
Reading spectra...
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.sun.xml.bind.v2.runtime.reflect.opt.Injector$1 (file:/Library/Frameworks/R.framework/Versions/4.0/Resources/library/MSGFplus/MSGFPlus/MSGFPlus.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int)
WARNING: Please consider reporting this to the maintainers of com.sun.xml.bind.v2.runtime.reflect.opt.Injector$1
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Ignoring 0 profile spectra.
Ignoring 2364 spectra having less than 10 peaks.
Reading spectra finished (elapsed time: 51.32 sec)
Using 8 threads.
Search Parameters:
    PrecursorMassTolerance: 10.0ppm
    IsotopeError: 0,1
    TargetDecoyAnalysis: true
    FragmentationMethod: As written in the spectrum or CID if no info
    Instrument: QExactive
    Enzyme: Tryp
    Protocol: Standard
    NumTolerableTermini: 1
    MinPeptideLength: 6
    MaxPeptideLength: 25
    NumMatchesPerSpec: 2
Spectrum 0-18521 (total: 18522)
pool-1-thread-1: Preprocessing spectra...
pool-1-thread-8: Preprocessing spectra...
Loading built-in param file: HCD_QExactive_Tryp.param
Loading built-in param file: HCD_QExactive_Tryp.param
pool-1-thread-2: Preprocessing spectra...
Loading built-in param file: HCD_QExactive_Tryp.param
pool-1-thread-7: Preprocessing spectra...
Loading built-in param file: HCD_QExactive_Tryp.param
pool-1-thread-6: Preprocessing spectra...
Loading built-in param file: HCD_QExactive_Tryp.param
pool-1-thread-4: Preprocessing spectra...
pool-1-thread-3: Preprocessing spectra...
Loading built-in param file: HCD_QExactive_Tryp.param
Loading built-in param file: HCD_QExactive_Tryp.param
pool-1-thread-5: Preprocessing spectra...
Loading built-in param file: HCD_QExactive_Tryp.param
pool-1-thread-3: Preprocessing spectra finished (elapsed time: 28.00 sec)
pool-1-thread-3: Database search...
pool-1-thread-3: Database search progress... 0.0% complete
pool-1-thread-2: Preprocessing spectra finished (elapsed time: 28.00 sec)
pool-1-thread-2: Database search...
pool-1-thread-2: Database search progress... 0.0% complete
pool-1-thread-7: Preprocessing spectra finished (elapsed time: 28.00 sec)
pool-1-thread-7: Database search...
pool-1-thread-7: Database search progress... 0.0% complete
pool-1-thread-8: Preprocessing spectra finished (elapsed time: 28.00 sec)
pool-1-thread-8: Database search...
pool-1-thread-8: Database search progress... 0.0% complete
pool-1-thread-6: Preprocessing spectra finished (elapsed time: 28.00 sec)
pool-1-thread-6: Database search...
pool-1-thread-6: Database search progress... 0.0% complete
pool-1-thread-4: Preprocessing spectra finished (elapsed time: 28.00 sec)
pool-1-thread-4: Database search...
pool-1-thread-4: Database search progress... 0.0% complete
pool-1-thread-5: Preprocessing spectra finished (elapsed time: 28.00 sec)
pool-1-thread-5: Database search...
pool-1-thread-5: Database search progress... 0.0% complete
pool-1-thread-1: Preprocessing spectra finished (elapsed time: 28.00 sec)
pool-1-thread-1: Database search...
pool-1-thread-1: Database search progress... 0.0% complete
pool-1-thread-1: Database search progress... 3.8% complete
pool-1-thread-7: Database search progress... 3.8% complete
pool-1-thread-5: Database search progress... 3.8% complete
pool-1-thread-4: Database search progress... 3.8% complete
pool-1-thread-6: Database search progress... 3.8% complete
pool-1-thread-8: Database search progress... 3.8% complete
pool-1-thread-2: Database search progress... 3.8% complete
pool-1-thread-3: Database search progress... 3.8% complete
pool-1-thread-3: Database search progress... 7.6% complete
pool-1-thread-6: Database search progress... 7.6% complete
pool-1-thread-8: Database search progress... 7.6% complete
pool-1-thread-4: Database search progress... 7.6% complete
pool-1-thread-2: Database search progress... 7.6% complete
pool-1-thread-1: Database search progress... 7.6% complete
pool-1-thread-5: Database search progress... 7.6% complete
pool-1-thread-7: Database search progress... 7.6% complete
pool-1-thread-8: Database search progress... 11.4% complete
pool-1-thread-6: Database search progress... 11.4% complete
pool-1-thread-1: Database search progress... 11.4% complete
pool-1-thread-5: Database search progress... 11.4% complete
pool-1-thread-7: Database search progress... 11.4% complete
pool-1-thread-3: Database search progress... 11.4% complete
pool-1-thread-4: Database search progress... 11.4% complete
pool-1-thread-2: Database search progress... 11.4% complete
pool-1-thread-8: Database search progress... 15.3% complete
pool-1-thread-5: Database search progress... 15.3% complete
pool-1-thread-3: Database search progress... 15.3% complete
pool-1-thread-7: Database search progress... 15.3% complete
pool-1-thread-4: Database search progress... 15.3% complete
pool-1-thread-1: Database search progress... 15.3% complete
pool-1-thread-6: Database search progress... 15.3% complete
pool-1-thread-2: Database search progress... 15.3% complete
pool-1-thread-1: Database search progress... 19.1% complete
pool-1-thread-7: Database search progress... 19.1% complete
pool-1-thread-3: Database search progress... 19.1% complete
pool-1-thread-2: Database search progress... 19.1% complete
pool-1-thread-8: Database search progress... 19.1% complete
pool-1-thread-4: Database search progress... 19.1% complete
pool-1-thread-5: Database search progress... 19.1% complete
pool-1-thread-6: Database search progress... 19.1% complete
pool-1-thread-8: Database search progress... 22.9% complete
pool-1-thread-3: Database search progress... 22.9% complete
pool-1-thread-4: Database search progress... 22.9% complete
pool-1-thread-1: Database search progress... 22.9% complete
pool-1-thread-6: Database search progress... 22.9% complete
pool-1-thread-7: Database search progress... 22.9% complete
pool-1-thread-2: Database search progress... 22.9% complete
pool-1-thread-5: Database search progress... 22.9% complete
pool-1-thread-7: Database search progress... 26.7% complete
pool-1-thread-8: Database search progress... 26.7% complete
pool-1-thread-3: Database search progress... 26.7% complete
pool-1-thread-5: Database search progress... 26.7% complete
pool-1-thread-2: Database search progress... 26.7% complete
pool-1-thread-1: Database search progress... 26.7% complete
pool-1-thread-6: Database search progress... 26.7% complete
pool-1-thread-4: Database search progress... 26.7% complete
pool-1-thread-3: Database search progress... 30.5% complete
pool-1-thread-1: Database search progress... 30.5% complete
pool-1-thread-8: Database search progress... 30.5% complete
pool-1-thread-5: Database search progress... 30.5% complete
pool-1-thread-4: Database search progress... 30.5% complete
pool-1-thread-7: Database search progress... 30.5% complete
pool-1-thread-2: Database search progress... 30.5% complete
pool-1-thread-6: Database search progress... 30.5% complete
pool-1-thread-3: Database search progress... 34.3% complete
pool-1-thread-8: Database search progress... 34.3% complete
pool-1-thread-5: Database search progress... 34.3% complete
pool-1-thread-6: Database search progress... 34.3% complete
pool-1-thread-1: Database search progress... 34.3% complete
pool-1-thread-7: Database search progress... 34.3% complete
pool-1-thread-4: Database search progress... 34.3% complete
pool-1-thread-2: Database search progress... 34.3% complete
pool-1-thread-3: Database search progress... 38.1% complete
pool-1-thread-1: Database search progress... 38.1% complete
pool-1-thread-5: Database search progress... 38.1% complete
pool-1-thread-2: Database search progress... 38.1% complete
pool-1-thread-8: Database search progress... 38.1% complete
pool-1-thread-7: Database search progress... 38.1% complete
pool-1-thread-6: Database search progress... 38.1% complete
pool-1-thread-4: Database search progress... 38.1% complete
pool-1-thread-5: Database search progress... 41.9% complete
pool-1-thread-6: Database search progress... 41.9% complete
pool-1-thread-1: Database search progress... 41.9% complete
pool-1-thread-8: Database search progress... 41.9% complete
pool-1-thread-3: Database search progress... 41.9% complete
pool-1-thread-4: Database search progress... 41.9% complete
pool-1-thread-7: Database search progress... 41.9% complete
pool-1-thread-2: Database search progress... 41.9% complete
pool-1-thread-6: Database search progress... 45.8% complete
pool-1-thread-5: Database search progress... 45.8% complete
pool-1-thread-8: Database search progress... 45.8% complete
pool-1-thread-1: Database search progress... 45.8% complete
pool-1-thread-4: Database search progress... 45.8% complete
pool-1-thread-3: Database search progress... 45.8% complete
pool-1-thread-7: Database search progress... 45.8% complete
pool-1-thread-2: Database search progress... 45.8% complete
pool-1-thread-1: Database search progress... 49.6% complete
pool-1-thread-3: Database search progress... 49.6% complete
pool-1-thread-8: Database search progress... 49.6% complete
pool-1-thread-6: Database search progress... 49.6% complete
pool-1-thread-4: Database search progress... 49.6% complete
pool-1-thread-5: Database search progress... 49.6% complete
pool-1-thread-7: Database search progress... 49.6% complete
pool-1-thread-2: Database search progress... 49.6% complete
pool-1-thread-1: Database search progress... 53.4% complete
pool-1-thread-6: Database search progress... 53.4% complete
pool-1-thread-5: Database search progress... 53.4% complete
pool-1-thread-3: Database search progress... 53.4% complete
pool-1-thread-8: Database search progress... 53.4% complete
pool-1-thread-4: Database search progress... 53.4% complete
pool-1-thread-7: Database search progress... 53.4% complete
pool-1-thread-2: Database search progress... 53.4% complete
pool-1-thread-3: Database search progress... 57.2% complete
pool-1-thread-1: Database search progress... 57.2% complete
pool-1-thread-8: Database search progress... 57.2% complete
pool-1-thread-6: Database search progress... 57.2% complete
pool-1-thread-5: Database search progress... 57.2% complete
pool-1-thread-4: Database search progress... 57.2% complete
pool-1-thread-7: Database search progress... 57.2% complete
pool-1-thread-2: Database search progress... 57.2% complete
pool-1-thread-3: Database search progress... 61.0% complete
pool-1-thread-1: Database search progress... 61.0% complete
pool-1-thread-8: Database search progress... 61.0% complete
pool-1-thread-6: Database search progress... 61.0% complete
pool-1-thread-4: Database search progress... 61.0% complete
pool-1-thread-5: Database search progress... 61.0% complete
pool-1-thread-7: Database search progress... 61.0% complete
pool-1-thread-2: Database search progress... 61.0% complete
pool-1-thread-4: Database search progress... 64.8% complete
pool-1-thread-3: Database search progress... 64.8% complete
pool-1-thread-6: Database search progress... 64.8% complete
pool-1-thread-1: Database search progress... 64.8% complete
pool-1-thread-5: Database search progress... 64.8% complete
pool-1-thread-8: Database search progress... 64.8% complete
pool-1-thread-7: Database search progress... 64.8% complete
pool-1-thread-2: Database search progress... 64.8% complete
pool-1-thread-6: Database search progress... 68.6% complete
pool-1-thread-8: Database search progress... 68.6% complete
pool-1-thread-1: Database search progress... 68.6% complete
pool-1-thread-4: Database search progress... 68.6% complete
pool-1-thread-5: Database search progress... 68.6% complete
pool-1-thread-3: Database search progress... 68.6% complete
pool-1-thread-7: Database search progress... 68.6% complete
pool-1-thread-2: Database search progress... 68.6% complete
pool-1-thread-6: Database search progress... 72.4% complete
pool-1-thread-3: Database search progress... 72.4% complete
pool-1-thread-8: Database search progress... 72.4% complete
pool-1-thread-5: Database search progress... 72.4% complete
pool-1-thread-1: Database search progress... 72.4% complete
pool-1-thread-2: Database search progress... 72.4% complete
pool-1-thread-4: Database search progress... 72.4% complete
pool-1-thread-7: Database search progress... 72.4% complete
pool-1-thread-8: Database search progress... 76.3% complete
pool-1-thread-6: Database search progress... 76.3% complete
pool-1-thread-1: Database search progress... 76.3% complete
pool-1-thread-3: Database search progress... 76.3% complete
pool-1-thread-5: Database search progress... 76.3% complete
pool-1-thread-4: Database search progress... 76.3% complete
pool-1-thread-2: Database search progress... 76.3% complete
pool-1-thread-7: Database search progress... 76.3% complete
pool-1-thread-6: Database search progress... 80.1% complete
pool-1-thread-8: Database search progress... 80.1% complete
pool-1-thread-1: Database search progress... 80.1% complete
pool-1-thread-3: Database search progress... 80.1% complete
pool-1-thread-4: Database search progress... 80.1% complete
pool-1-thread-5: Database search progress... 80.1% complete
pool-1-thread-2: Database search progress... 80.1% complete
pool-1-thread-7: Database search progress... 80.1% complete
pool-1-thread-6: Database search progress... 83.9% complete
pool-1-thread-8: Database search progress... 83.9% complete
pool-1-thread-1: Database search progress... 83.9% complete
pool-1-thread-3: Database search progress... 83.9% complete
pool-1-thread-4: Database search progress... 83.9% complete
pool-1-thread-5: Database search progress... 83.9% complete
pool-1-thread-2: Database search progress... 83.9% complete
pool-1-thread-7: Database search progress... 83.9% complete
pool-1-thread-8: Database search progress... 87.7% complete
pool-1-thread-6: Database search progress... 87.7% complete
pool-1-thread-1: Database search progress... 87.7% complete
pool-1-thread-3: Database search progress... 87.7% complete
pool-1-thread-4: Database search progress... 87.7% complete
pool-1-thread-2: Database search progress... 87.7% complete
pool-1-thread-5: Database search progress... 87.7% complete
pool-1-thread-7: Database search progress... 87.7% complete
pool-1-thread-8: Database search progress... 91.5% complete
pool-1-thread-6: Database search progress... 91.5% complete
pool-1-thread-3: Database search progress... 91.5% complete
pool-1-thread-1: Database search progress... 91.5% complete
pool-1-thread-4: Database search progress... 91.5% complete
pool-1-thread-2: Database search progress... 91.5% complete
pool-1-thread-5: Database search progress... 91.5% complete
pool-1-thread-7: Database search progress... 91.5% complete
pool-1-thread-8: Database search progress... 95.3% complete
pool-1-thread-6: Database search progress... 95.3% complete
pool-1-thread-3: Database search progress... 95.3% complete
pool-1-thread-1: Database search progress... 95.3% complete
pool-1-thread-4: Database search progress... 95.3% complete
pool-1-thread-2: Database search progress... 95.3% complete
pool-1-thread-5: Database search progress... 95.3% complete
pool-1-thread-7: Database search progress... 95.3% complete
pool-1-thread-3: Database search progress... 99.1% complete
pool-1-thread-8: Database search progress... 99.1% complete
pool-1-thread-1: Database search progress... 99.1% complete
pool-1-thread-6: Database search progress... 99.1% complete
pool-1-thread-4: Database search progress... 99.1% complete
pool-1-thread-5: Database search progress... 99.1% complete
pool-1-thread-2: Database search progress... 99.1% complete
pool-1-thread-7: Database search progress... 99.1% complete
pool-1-thread-3: Database search finished (elapsed time: 459.00 sec)
pool-1-thread-3: Computing spectral E-values...
pool-1-thread-8: Database search finished (elapsed time: 458.00 sec)
pool-1-thread-8: Computing spectral E-values...
pool-1-thread-6: Database search finished (elapsed time: 458.00 sec)
pool-1-thread-6: Computing spectral E-values...
pool-1-thread-1: Database search finished (elapsed time: 458.00 sec)
pool-1-thread-1: Computing spectral E-values...
pool-1-thread-4: Database search finished (elapsed time: 459.00 sec)
pool-1-thread-4: Computing spectral E-values...
pool-1-thread-5: Database search finished (elapsed time: 459.00 sec)
pool-1-thread-5: Computing spectral E-values...
pool-1-thread-2: Database search finished (elapsed time: 459.00 sec)
pool-1-thread-2: Computing spectral E-values...
pool-1-thread-7: Database search finished (elapsed time: 460.00 sec)
pool-1-thread-7: Computing spectral E-values...
pool-1-thread-3: Computing spectral E-values... 43.2% complete
pool-1-thread-1: Computing spectral E-values... 43.2% complete
pool-1-thread-8: Computing spectral E-values... 43.2% complete
pool-1-thread-6: Computing spectral E-values... 43.2% complete
pool-1-thread-4: Computing spectral E-values... 43.2% complete
pool-1-thread-5: Computing spectral E-values... 43.2% complete
pool-1-thread-2: Computing spectral E-values... 43.2% complete
pool-1-thread-7: Computing spectral E-values... 43.2% complete
pool-1-thread-3: Computing spectral E-values... 86.4% complete
pool-1-thread-1: Computing spectral E-values... 86.4% complete
pool-1-thread-8: Computing spectral E-values... 86.4% complete
pool-1-thread-5: Computing spectral E-values... 86.4% complete
pool-1-thread-4: Computing spectral E-values... 86.4% complete
pool-1-thread-7: Computing spectral E-values... 86.4% complete
pool-1-thread-2: Computing spectral E-values... 86.4% complete
pool-1-thread-6: Computing spectral E-values... 86.4% complete
pool-1-thread-3: Computing spectral E-values finished (elapsed time: 163.00 sec)
pool-1-thread-1: Computing spectral E-values finished (elapsed time: 163.00 sec)
pool-1-thread-8: Computing spectral E-values finished (elapsed time: 164.00 sec)
pool-1-thread-5: Computing spectral E-values finished (elapsed time: 163.00 sec)
pool-1-thread-4: Computing spectral E-values finished (elapsed time: 163.00 sec)
pool-1-thread-2: Computing spectral E-values finished (elapsed time: 163.00 sec)
pool-1-thread-7: Computing spectral E-values finished (elapsed time: 163.00 sec)
pool-1-thread-6: Computing spectral E-values finished (elapsed time: 164.00 sec)
Computing q-values...
Computing q-values finished (elapsed time: 0.22 sec)
Writing results...
Writing results finished (elapsed time: 12.54 sec)
MS-GF+ complete (total elapsed time: 716.63 sec)
reading 20181113_010_autoQC01.mzid... DONE!

The results are written to mzIdent files (another HUPO-PSI format). These can be read using Bioc mzID:

> library(mzID)
> results <- mzID("/Users/tobiasko/Downloads/20181113_010_autoQC01.mzid")
reading 20181113_010_autoQC01.mzid... DONE!
> results
An mzID object

Software used:   MS-GF+ (version: Beta (v10072))

Rawfile:         /Users/tobiasko/Downloads/20181113_010_autoQC01.mzML

Database:        /Users/tobiasko/Downloads/uniprot-proteome_UP000005640.fasta

Number of scans: 11780
Number of PSM's: 24446

Hope this helps, Tobi

rcastelo commented 3 years ago

Dear Christian and Tobias, thank you very much again for sharing your frank opinion on this and for your very specific advice. I agree that if there's robust code existing for a specific analysis goal, it doesn't make sense to rewrite it. I just thought that because in the rawR manuscript you motivate the need to have direct access to raw MS data from R, maybe you had in mind also the search problem too, but I agree with you that if we can wrap it from R, then it's a matter of finding the suitable containers to exchange the data with upstream and downstream parts of the pipeline. I'll try the MSGF engine with the conversion tools that you have showcased. Vielen dank nochmal, ihr seid großartig!!!