fgcz / rawDiag

Brings Orbitrap mass spectrometry data to life; multi-platform, fast and colorful R package
https://bioconductor.org/packages/rawDiag
36 stars 11 forks source link

.computeBestPeptideSpectrumMatch function #65

Closed NicolasEsk closed 3 years ago

NicolasEsk commented 3 years ago

In your latest publication (Proteomics Forum 2019 poster) you mention a PSM computation snippet using the function .computeBestPeptideSpectrumMatch. What it's supposed to do is very interesting to me, however this function doesn't exist in rawDiag (I tried 0.0.34 or 0.0.38), and I can't find where it comes from. Could you tell me what I am doing wrong?

Thank you so much for rawDiag, this is truly a great piece of work!

Nicolas

tobiasko commented 3 years ago

Hi @NicolasEsk,

that is a good question. I went to the code used for the poster, but I can not determine which rawDiag version was used. Maybe @cpanse can reveal the secret? :-)

cpanse commented 3 years ago

we have a self-explaining vignette for computing the matches here: fgcz-ms.uzh.ch/~cpanse/rawDiagXICdemo.pdf

cpanse commented 3 years ago

the code snippet is here:

#R

.computeBestPeptideSpectrumMatch <- function(rawfile="/Users/cp/Downloads/20180220_14_autoQC01.raw",
                         pepSeq = c("LGGNEQVTR", "GAGSSEPVTGLDAK", "VEATFGVDESNAK",
                                    "TPVISGGPYEYR", "YILAGVENSK", "TPVITGAPYEYR", "DGLDAASYYAPVR",
                                    "ADVTPADFSEWSK", "GTFIIDPGGVIR", "GTFIIDPAAVIR", "LFLQFGAQGSPFLK"),
                         peptideMassTolerance = 0.003){
    mass2Hplus <- (parentIonMass(pepSeq) + 1.008) / 2

    S <- read.raw(rawfile)
    S <- S[-which(S$MSOrder != "Ms2"), ]

    idx <- lapply(mass2Hplus, function(m){
        which(abs(S$PrecursorMass - m) < peptideMassTolerance)
    })

    # just to be generic correct
    scanNumbers <- lapply(idx, function(x){S$scanNumber[x]})

    HCDIons <- function (b, y)
    {
        Hydrogen <- 1.007825
        Oxygen <- 15.994915
        Nitrogen <- 14.003074
        # c <- b + (Nitrogen + (3 * Hydrogen))
        # z <- y - (Nitrogen + (3 * Hydrogen))
        # return(cbind(b, y,c ,z))
        return(cbind(b, y))
    }

    bestMatchingMS2Scan <- sapply(1:11, function(i){
        PL <- readScans(rawfile, scans = scanNumbers[[i]])

        pp <- lapply(PL, function(x){psm(pepSeq[i], x, FUN = HCDIons, plot = FALSE)})

        score <- sapply(1:length(pp),
                        function(j){
                            sum(PL[[j]]$intensity[abs(pp[[j]]$mZ.Da.error) < 0.1])}) #find best scoring spectra
         bestFirstMatch <- which(max(score, na.rm = TRUE) == score)[1]
         scanNumbers[[i]][bestFirstMatch]
    })

    bestMatchingMS2Scan
}
NicolasEsk commented 3 years ago

Thanks for both your answers. Before even trying to integrate that to my code, I went and tried to run the snippet just as is (just changed the rawfile path with a file that works and gives me good results with rawDiag functions). So I run the snippet you sent, then go back to the poster's code but I get this error, that could be related to just formatting inconsistency? I have limited knowledge in R coding, so it might be something obvious. To be noted, my data is not PRM but DDA.

scanIds <- .computeBestPeptideSpectrumMatch(rawfile, c("LGGNEQVTR", "GAGSSEPVTGLDAK", "VEATFGVDESNAK", "TPVISGGPYEYR", "YILAGVENSK", "TPVITGAPYEYR", "DGLDAASYYAPVR", "ADVTPADFSEWSK", "GTFIIDPGGVIR", "GTFIIDPAAVIR", "LFLQFGAQGSPFLK"), peptideMassTolerance = 0.003) system2 is writting to tempfile C:\Users\user\AppData\Local\Temp\Rtmp2xSCc3\file6c5834eb2201tsv ... unlinking C:\Users\user\AppData\Local\Temp\Rtmp2xSCc3\file6c5834eb2201tsv ... MasterScanNumber calculated renamed LMmZCorrectionppm to LMCorrection renamed AGCPSMode to PrescanMode Error accessing RAWFileReader library! - Error while retrieving centroid peaks for 0. The scan number must be >= 1 and <= 21274. Memory Usage: Before 16340 kb, After 111328 kb, Extra 94988 kb Error in PL[[j]] : subscript out of bounds In addition: Warning message: In is.rawDiag(object) : missing column name(s): MasterScanNumber, LMCorrection, ElapsedScanTimesec, transient, AGCMode, PrescanMode Called from: FUN(X[[i]], ...) Error during wrapup: unimplemented type (29) in 'eval' Error: no more error handlers available (recursive errors?); invoking 'abort' restart Error during wrapup: INTEGER() can only be applied to a 'integer', not a 'unknown type #29' Error: no more error handlers available (recursive errors?); invoking 'abort' restart

cpanse commented 3 years ago

you ran out of disk space: you have two options:

  1. Do not read all spectra of the rawfile at once.

  2. or use the latest version and set the tmp space

    install.packages('http://fgcz-ms.uzh.ch/~cpanse/rawDiag_0.0.38.tar.gz', repo=NULL)
NicolasEsk commented 3 years ago

Perfect, works like a charm now 💯 Thanks for the help, and even more for the package as a whole

Nicolas

NicolasEsk commented 3 years ago

Salutations

For some obscure reason, last problem is back. Two weeks ago after updating to version 0.0.38, the error disappeared on every raw file I was using as test files. Now there's a new file, and the problem is back.

scanIds <-  .computeBestPeptideSpectrumMatch(rawfile, substr(XICseq, 1, nchar(XICseq) - 3), peptideMassTolerance = 0.02)
Error accessing RAWFileReader library! - Error while retrieving centroid peaks for 0. The scan number must be >= 1 and <= 21116.
Memory Usage:
   Before 16424 kb, After 107224 kb, Extra 90800 kb
Error in PL[[j]] : subscript out of bounds
Called from: FUN(X[[i]], ...)
Browse[1]> scans <- readScans(rawfile, scanIds)
Error during wrapup: object 'scanIds' not found
Error: no more error handlers available (recursive errors?); invoking 'abort' restart

Don't know if my solution was working then... I added this before my code, is it correct?

rm(list = ls())
write("TMPDIR = 'E:/RTemp'", file=file.path(Sys.getenv('R_USER'), '.Renviron'))

My E drive has 8TB of free space, I don't see how it could be space realted (or maybe Rstudio has a maximum buffer I can change somehow?)

Thank you for your help Nicolas

cpanse commented 3 years ago

Hoi Nicolas, This is an R helper function for our poster addressing a targeted approach, namely to detect Biognosis IRT peptides, and it has never been intended to be used for something else. I'm sorry to say you have to debug that function on your own. Good luck, Christian

tobiasko commented 3 years ago

Hi @NicolasEsk ,

does your sample contain any iRT peptides and how was the data recorded (is it PRM)? To me it looks like no, because you request a scan of index 0:

Error while retrieving centroid peaks for 0. The scan number must be >= 1 and <= 21116.

This most likely happens because there are no matching MS2 scans in your data.

NicolasEsk commented 3 years ago

I am not using iRT as I wanted to adapt the function to my project, but I really think this shouldn't make the script break. I still need more testing but indeed it seems you are right Tobias. I am doing ddMS2 so if the peptide isn't selected for fragmentation, scanNumber will return zeros which are obviously not found as a valid scan number in the next steps of the function. I'll try running PRM runs as soon as possible to keep you updated on the problem.

NicolasEsk commented 3 years ago

It looks that was it. Error never happened with any of the PRM runs.

Thanks a lot for the precious help