Closed cpanse closed 5 years ago
I think this task is not named correctly. If we go for this we do not want to have a coercion function, but we need to def. a backend that can access spectral data stored in raw files:
https://github.com/rformassspectrometry/Spectra/blob/master/man/MsBackend.Rd
The easiest might be to create an MsBackendDataFrame that keeps all data in memory. A backend that keeps peak lists on disk might be a cool thing. Here lazy evaluation could become pretty powerful. One could also think about mixed models. Have metadata in memory and keep spectral data on disk.
Yes, I confirm that the correct way forward would be to define a backend that uses the mono libraries to access the raw data and the metadata in a DataFrame
- see the MsBackendMzR
for that does exactly that but with mzR
.
@lgatto @tobiasko yep; I am not so fast ... upgrading BioC form 3.7 to devel on my playground box reminds me of my 1st SuSE4.2 install in 1996. (except changing the install disk 1-3)
@lgatto, would you suggest to bundle the code for the alternative backend into a separate bioconductor package? Will bioconductor be ok with hosting Thermo DLLs?
Yes, a separate MsBackendRawDiag
package on github that ships the DLLs is an option. Or, a function that downloads them upon first initialisation is also a solution.
Pinging @jorainer
library(rawDiag)
(rawfile <- file.path(path.package(package = 'rawDiag'), 'extdata', 'sample.raw'))
system.time(PLS <-readScans(rawfile))
system.time(DF <- as.peaklistSet.DataFrame(PLS))
DF$fromFile = as.integer(1)
if(require(Spectra)){
rawDiagSample <- MsBackendDataFrame()
system.time(BE <- backendInitialize(object=rawDiagSample, files=rawfile, spectraData=DF))
}
R> library(rawDiag)
R> (rawfile <- file.path(path.package(package = 'rawDiag'), 'extdata', 'sample.raw'))
[1] "/home/cp/R/x86_64-pc-linux-gnu-library/3.6/rawDiag/extdata/sample.raw"
R> system.time(PLS <-readScans(rawfile))
user system elapsed
0.123 0.053 0.643
R> system.time(DF <- as.peaklistSet.DataFrame(PLS))
user system elapsed
0.029 0.000 0.029
R> DF$fromFile = as.integer(1)
R>
R> if(require(Spectra)){
+ rawDiagSample <- MsBackendDataFrame()
+ system.time(BE <- backendInitialize(object=rawDiagSample, files=rawfile, spectraData=DF))
+ }
user system elapsed
0.122 0.000 0.122
R> BE
MsBackendDataFrame with 574 spectra
msLevel rtime scanIndex
<integer> <numeric> <integer>
1 1 0.097 1
2 2 0.35 2
3 2 0.419 3
4 2 0.489 4
5 2 0.558 5
... ... ... ...
570 2 46.512 570
571 2 46.581 571
572 2 46.651 572
573 1 46.806 573
574 2 47.059 574
... 18 more variables/columns.
MsBackendRawDiag()
library(rawDiag)
fls <- rep(rawfile <- file.path(path.package(package = 'rawDiag'), 'extdata', 'sample.raw'), 1)
be <- backendInitialize(MsBackendRawDiag(), files = fls)
sps_thermofinnigan <- Spectra(be)
sps_thermofinnigan
R> sps_thermofinnigan
MSn data (Spectra) with 574 spectra in a MsBackendRawDiag backend:
msLevel rtime scanIndex
<integer> <numeric> <integer>
1 1 0.097 1
2 2 0.35 2
3 2 0.419 3
4 2 0.489 4
5 2 0.558 5
... ... ... ...
570 2 46.512 570
571 2 46.581 571
572 2 46.651 572
573 1 46.806 573
574 2 47.059 574
... 15 more variables/columns.
file(s):
sample.raw
Processing:
R>
Cool!!!
But this statement still confuses me: as.peaklistSet.DataFrame(...)
The above function coerces a peaklistSet object to a DataFrame? Why not simply as.DataFrame(...)
? Like date <- as.Date("2017-01-01")
.
And I am still wondering if there is a more elegant way to initialize the backend. Maybe @lgatto can explain us why the backendInitialize(...)
needs a file argument if used for MsBackendDataFrame
. DF$fromFile = as.integer(1)
looks creepy.
Nice! But be aware that there will be some quite substantial changes to the MsBackend
:
@files
slot will be removed, information about where the data is stored should be provided by the dataStorage
spectra variable (and method).fromFile
will be removed.fileNames
will be removed.I am currently finalizing the required changes in the Spectra
and fixing/adding unit tests. I'll ping you when it is ready.
@jorainer Thx for keeping us in the loop! Is the class definition of Spectra
already stable/is it safe to inherit?
There aren't any major changes to be expected, but some small changes are very possible. If you prefer to wait for a more stable release, I would suggest to wait of a pre-Bioconductor release (I think it is conceivable that we will submit for the next release).
Cool!!!
But this statement still confuses me:
as.peaklistSet.DataFrame(...)
The above function coerces a peaklistSet object to a DataFrame? Why not simply
as.DataFrame(...)
? Likedate <- as.Date("2017-01-01")
.
@tobiasko this is just S3 cosmetics
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Registering-S3-methods
this is just a Hello, World!
function. We won't use that S3method.
And I am still wondering if there is a more elegant way to initialize the backend. Maybe @lgatto can explain us why the
backendInitialize(...)
needs a file argument if used forMsBackendDataFrame
.DF$fromFile = as.integer(1)
looks creepy.
@cpanse Ok, let's discuss coding style issues offline! ;-)
We recently made some changes in the MsBackend
definition:
@files
and @modCount
slots are gone.dataStorage
and dataOrigin
The dataStorage
is thought to be the replacement for the @files
with the difference, that it will return for each spectrum the current storage location (e.g. mzML file, memory, HDF5 file, ...).
I don't expect any major chages anymore in the MsBackend
class. I will work now mostly in implementing all missing analysis methods for Spectra
.
@jorainer @lgatto @sgibb; we are on the way shaping a MsBackendRawFileReader
package.
Package: MsBackendRawFileReader
Type: Package
Title: Bridging Spectra and ThermoFinnigan raw files
Version: 0.0.1
Authors@R: c(person(given = "Christian",
family = "Panse", email = "cp@fgcz.ethz.ch", role = c("aut", "cre"),
comment = c(ORCID = "0000-0003-1975-3064")),
person(given = "Tobias", family = "Kockmann",
email = "Tobias.Kockmann@fgcz.ethz.ch", role = "aut",
comment = c(ORCID = "0000-0002-1847-885X")))
Depends: R (>= 3.6),
IRanges,
methods,
Spectra,
rDotNet (>= 0.9)
Suggests:
knitr,
testthat
Description: implements an MsBackend for the Spectra package using
Thermo Fisher Scientific's NewRawFileReader .Net libraries.
The package is generalizing the functionallity introduced by the
rawDiag package (Trachsel, 2018 <doi:10.1021/acs.jproteome.8b00173>).
SystemRequirements: mono 4.x or higher on OSX / Linux, .NET 4.x or
higher on Windows, 'msbuild' and 'nuget' available in the path
URL: https://github.com/cpanse/MsBackendRawFileReader
BugReports: https://github.com/cpanse/MsBackendRawFileReader/issues
Encoding: UTF-8
LazyData: true
NeedsCompilation: no
RoxygenNote: 6.1.1
License: GPL-3
VignetteBuilder: knitr
Collate:
'hidden_aliases.R'
'AllGenerics.R'
'MsBackendRawFileReader-functions.R'
'MsBackendRawFileReader.R'
'zzz.R'
That's cool.
I had initially the impression that you were using the MsBackendDataFrame
as template. You should however rather look at MsBackendMzR
for a better fit.
@lgatto at the moment I am only ctrl-c/ ctrl-v-ing from MsBackendMzR
. We have it private for the moment to avoid confusion, but if you wish, we can add you all three to the repository at any time.
I wouldn't have much time for development at the moment, but would be happy to provide support if needed. I imaging that @jorainer and @sgibb would also contribute important feedback.
I wouldn't have much time for development at the moment, but would be happy to provide support if needed. I imaging that @jorainer and @sgibb would also contribute important feedback.
@lgatto @jorainer @sgibb we added you to the repo just to enable some transparency and important feedback.
https://github.com/fgcz/rawDiag/blob/0d3f5d465296240d2c100357c95a2777d7e74770/R/rawDiag.R#L496
https://github.com/rformassspectrometry/Spectra