fgcz / rawDiag

Brings Orbitrap mass spectrometry data to life; multi-platform, fast and colorful R package
https://bioconductor.org/packages/rawDiag
36 stars 11 forks source link

implement `MsBackendRawDiag.R` for `Spectra` #51

Closed cpanse closed 5 years ago

cpanse commented 5 years ago

https://github.com/fgcz/rawDiag/blob/0d3f5d465296240d2c100357c95a2777d7e74770/R/rawDiag.R#L496

https://github.com/rformassspectrometry/Spectra

tobiasko commented 5 years ago

I think this task is not named correctly. If we go for this we do not want to have a coercion function, but we need to def. a backend that can access spectral data stored in raw files:

https://github.com/rformassspectrometry/Spectra/blob/master/man/MsBackend.Rd

The easiest might be to create an MsBackendDataFrame that keeps all data in memory. A backend that keeps peak lists on disk might be a cool thing. Here lazy evaluation could become pretty powerful. One could also think about mixed models. Have metadata in memory and keep spectral data on disk.

lgatto commented 5 years ago

Yes, I confirm that the correct way forward would be to define a backend that uses the mono libraries to access the raw data and the metadata in a DataFrame - see the MsBackendMzR for that does exactly that but with mzR.

cpanse commented 5 years ago

@lgatto @tobiasko yep; I am not so fast ... upgrading BioC form 3.7 to devel on my playground box reminds me of my 1st SuSE4.2 install in 1996. (except changing the install disk 1-3)

tobiasko commented 5 years ago

@lgatto, would you suggest to bundle the code for the alternative backend into a separate bioconductor package? Will bioconductor be ok with hosting Thermo DLLs?

https://planetorbitrap.com/rawfilereader

lgatto commented 5 years ago

Yes, a separate MsBackendRawDiagpackage on github that ships the DLLs is an option. Or, a function that downloads them upon first initialisation is also a solution.

lgatto commented 5 years ago

Pinging @jorainer

cpanse commented 5 years ago

1. All in memory

library(rawDiag)
(rawfile <- file.path(path.package(package = 'rawDiag'), 'extdata', 'sample.raw'))
system.time(PLS <-readScans(rawfile))
system.time(DF <- as.peaklistSet.DataFrame(PLS))
DF$fromFile = as.integer(1)

if(require(Spectra)){
  rawDiagSample <- MsBackendDataFrame()
  system.time(BE <- backendInitialize(object=rawDiagSample, files=rawfile, spectraData=DF))
}
R> library(rawDiag)
R> (rawfile <- file.path(path.package(package = 'rawDiag'), 'extdata', 'sample.raw'))
[1] "/home/cp/R/x86_64-pc-linux-gnu-library/3.6/rawDiag/extdata/sample.raw"
R> system.time(PLS <-readScans(rawfile))
   user  system elapsed 
  0.123   0.053   0.643 
R> system.time(DF <- as.peaklistSet.DataFrame(PLS))
   user  system elapsed 
  0.029   0.000   0.029 
R> DF$fromFile = as.integer(1)
R> 
R> if(require(Spectra)){
+    rawDiagSample <- MsBackendDataFrame()
+    system.time(BE <- backendInitialize(object=rawDiagSample, files=rawfile, spectraData=DF))
+  }
   user  system elapsed 
  0.122   0.000   0.122 
R> BE
MsBackendDataFrame with 574 spectra
      msLevel     rtime scanIndex
    <integer> <numeric> <integer>
1           1     0.097         1
2           2      0.35         2
3           2     0.419         3
4           2     0.489         4
5           2     0.558         5
...       ...       ...       ...
570         2    46.512       570
571         2    46.581       571
572         2    46.651       572
573         1    46.806       573
574         2    47.059       574
 ... 18 more variables/columns.

2. Use MsBackendRawDiag()

library(rawDiag)
fls <- rep(rawfile <- file.path(path.package(package = 'rawDiag'), 'extdata', 'sample.raw'), 1)
be <- backendInitialize(MsBackendRawDiag(), files = fls)
sps_thermofinnigan <- Spectra(be)
sps_thermofinnigan
R> sps_thermofinnigan
MSn data (Spectra) with 574 spectra in a MsBackendRawDiag backend:
      msLevel     rtime scanIndex
    <integer> <numeric> <integer>
1           1     0.097         1
2           2      0.35         2
3           2     0.419         3
4           2     0.489         4
5           2     0.558         5
...       ...       ...       ...
570         2    46.512       570
571         2    46.581       571
572         2    46.651       572
573         1    46.806       573
574         2    47.059       574
 ... 15 more variables/columns.

file(s):
sample.raw
Processing:

R> 
tobiasko commented 5 years ago

Cool!!!

But this statement still confuses me: as.peaklistSet.DataFrame(...)

The above function coerces a peaklistSet object to a DataFrame? Why not simply as.DataFrame(...)? Like date <- as.Date("2017-01-01").

And I am still wondering if there is a more elegant way to initialize the backend. Maybe @lgatto can explain us why the backendInitialize(...) needs a file argument if used for MsBackendDataFrame. DF$fromFile = as.integer(1) looks creepy.

jorainer commented 5 years ago

Nice! But be aware that there will be some quite substantial changes to the MsBackend:

I am currently finalizing the required changes in the Spectra and fixing/adding unit tests. I'll ping you when it is ready.

tobiasko commented 5 years ago

@jorainer Thx for keeping us in the loop! Is the class definition of Spectra already stable/is it safe to inherit?

lgatto commented 5 years ago

There aren't any major changes to be expected, but some small changes are very possible. If you prefer to wait for a more stable release, I would suggest to wait of a pre-Bioconductor release (I think it is conceivable that we will submit for the next release).

cpanse commented 5 years ago

Cool!!!

But this statement still confuses me: as.peaklistSet.DataFrame(...)

The above function coerces a peaklistSet object to a DataFrame? Why not simply as.DataFrame(...)? Like date <- as.Date("2017-01-01").

@tobiasko this is just S3 cosmetics https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Registering-S3-methods this is just a Hello, World! function. We won't use that S3method.

And I am still wondering if there is a more elegant way to initialize the backend. Maybe @lgatto can explain us why the backendInitialize(...) needs a file argument if used for MsBackendDataFrame. DF$fromFile = as.integer(1) looks creepy.

tobiasko commented 5 years ago

@cpanse Ok, let's discuss coding style issues offline! ;-)

jorainer commented 5 years ago

We recently made some changes in the MsBackend definition:

The dataStorage is thought to be the replacement for the @files with the difference, that it will return for each spectrum the current storage location (e.g. mzML file, memory, HDF5 file, ...).

I don't expect any major chages anymore in the MsBackend class. I will work now mostly in implementing all missing analysis methods for Spectra.

cpanse commented 5 years ago

@jorainer @lgatto @sgibb; we are on the way shaping a MsBackendRawFileReader package.

Package: MsBackendRawFileReader
Type: Package
Title: Bridging Spectra and ThermoFinnigan raw files
Version: 0.0.1
Authors@R: c(person(given = "Christian",
    family = "Panse", email = "cp@fgcz.ethz.ch", role = c("aut", "cre"),
    comment = c(ORCID = "0000-0003-1975-3064")),
    person(given = "Tobias", family = "Kockmann",
      email = "Tobias.Kockmann@fgcz.ethz.ch", role = "aut", 
    comment = c(ORCID = "0000-0002-1847-885X")))
Depends: R (>= 3.6),
       IRanges,
    methods,
    Spectra,
    rDotNet (>= 0.9)
Suggests:
    knitr,
    testthat
Description: implements an MsBackend for the Spectra package using
  Thermo Fisher Scientific's NewRawFileReader .Net libraries.
  The package is generalizing the functionallity introduced by the
  rawDiag package (Trachsel, 2018 <doi:10.1021/acs.jproteome.8b00173>).
SystemRequirements: mono 4.x or higher on OSX / Linux, .NET 4.x or
        higher on Windows, 'msbuild' and 'nuget' available in the path
URL: https://github.com/cpanse/MsBackendRawFileReader
BugReports: https://github.com/cpanse/MsBackendRawFileReader/issues
Encoding: UTF-8
LazyData: true
NeedsCompilation: no
RoxygenNote: 6.1.1
License: GPL-3
VignetteBuilder: knitr
Collate: 
    'hidden_aliases.R'
    'AllGenerics.R'
    'MsBackendRawFileReader-functions.R'
    'MsBackendRawFileReader.R'
    'zzz.R'
lgatto commented 5 years ago

That's cool.

I had initially the impression that you were using the MsBackendDataFrame as template. You should however rather look at MsBackendMzR for a better fit.

cpanse commented 5 years ago

@lgatto at the moment I am only ctrl-c/ ctrl-v-ing from MsBackendMzR. We have it private for the moment to avoid confusion, but if you wish, we can add you all three to the repository at any time.

lgatto commented 5 years ago

I wouldn't have much time for development at the moment, but would be happy to provide support if needed. I imaging that @jorainer and @sgibb would also contribute important feedback.

cpanse commented 5 years ago

I wouldn't have much time for development at the moment, but would be happy to provide support if needed. I imaging that @jorainer and @sgibb would also contribute important feedback.

@lgatto @jorainer @sgibb we added you to the repo just to enable some transparency and important feedback.