fgcz / MsBackendRawFileReader

Spectra MsBackend for Thermo Fisher Scientific's New RawFileReader
https://bioconductor.org/packages/MsBackendRawFileReader/
5 stars 2 forks source link

Implement access to all coreSpectraVariables() #17

Closed RogerGinBer closed 1 year ago

RogerGinBer commented 1 year ago

Hi @cpanse!

I was trying to expand the spectraData currently retrieved from the raw files to include relevant variables in Spectra::coreSpectraVariables() like collisionEnergy, polarity, isolationWindow, etc. For context, our group is currently working with stepped-energy HCD MS2 spectra, but conversion to mzML always removes this info and leaves us with only one collision energy.

From what I've seen, it seems that all this data is in each rawrrSpectrum (ie. not in the header), so my idea would be to retrieve the spectra in groups at some point during .backendInitialize, extract the relevant information and add it to what's already in the spectraData DFrame.

Does that sound good? Would you be interested in a PR?

Cheers, Roger

cpanse commented 1 year ago

Hoi @RogerGinBer sounds good. Feel free to make a pull request if you know how to integrate it; otherwise, I will try it. Remember that not all values are available for all ThermoFisherScientific MS instruments. Thanks, Christian

RogerGinBer commented 1 year ago

Hi Christian, I'll give it a try first by myself and see how it goes 👍 Do you by chance have any reference table of available values/instruments? I have plenty of Orbitrap ID-X data as reference, but I'm not so familiar with other instruments

cpanse commented 1 year ago

I, unfortunately, do not have a reference table. C

tobiasko commented 1 year ago

Unfortunately Thermo never published a proper documentation of the RawFileReader. Even the homepage disappeared when Planet Orbitrap was "reorganised". These are the only trace that I can still find: analyteguru. For the old MSFileReader a really nice docu exists, see here. So when it comes to the task of finding out which keys exists and what they mean you are basically on your own or need to ask within the community. Many companies and open source projects are using the RawFileReader for their own software. These people might know specific key:value pairs from their own work. Other option: Ask at Thermo directly.

My impression is: Thermo adds key:value pairs to their raw file as need by the hardware people. There is also no common sense how to name the same things across hardware platforms. It is a big mess.

Hope this helps, Tobi

RogerGinBer commented 1 year ago

Thanks @tobiasko for the info! I think I'll contact Thermo directly In the meantime, I'll get started with the implementation using the RAW test data in this package and files of my own, Roger

cpanse commented 1 year ago

Thanks @tobiasko for the info! I think I'll contact Thermo directly In the meantime, I'll get started with the implementation using the RAW test data in this package and files of my own, Roger

Good luck!

tobiasko commented 1 year ago

@RogerGinBer it might also be interesting to have some ID-X data in our tartare package. Tartare is a Bioc exp data package used for extended testing. So far we only cover bottom-up proteomics LC-MS for Thermo HF-X and Fusion Lumos.

cpanse commented 1 year ago

The rawrr vignette shows how to use tartare https://bioconductor.org/packages/release/bioc/vignettes/rawrr/inst/doc/rawrr.html

cpanse commented 1 year ago

Dear @RogerGinBer;

It looks good.

1. I merged your request. Thank you!

2. Please note I renamed the default branch fgcz/MsBackendRawFileReader/ from master to devel as requested here https://bioconductor.github.io/biocblog/posts/2023-03-01-transition-to-devel/.

  1. I pushed 1.5.2 to Bioconductor https://bioconductor.org/packages/devel/bioc/html/MsBackendRawFileReader.html

Thanks again,

Christian