KujawinskiLaboratory / Autotuner

This repo contains the code needed to run the R package Autotuner. Autotuner is used to identify proper parameters during metabolomics data processing.
MIT License
16 stars 8 forks source link

Parsing error #29

Closed hh1985 closed 4 years ago

hh1985 commented 4 years ago

Hi,

I tried to process metabolomics data and met followig error: Error: There was a problem finding spectrum IDs within header file for this data. Error occured after function 'dissectScans'.

By debugging the function dissectScans,

function (mzDb, observedPeak, header) 
{
  scansOfPeak <- which(observedPeak$start < header$retentionTime & 
    header$retentionTime < observedPeak$end)
  peakHead <- header[scansOfPeak, ]
  ms1 <- peakHead$msLevel == 1L
  scanID <- as.numeric(sub("(.* )?scan=|(.* )?scanId=", "", 
    peakHead$spectrumId[ms1]))
 ...
}

I found that scanID returns NA due to the invalid value of peakHead$spectrumId: sample=1 period=1 cycle=233 experiment=1 which contained no scan or scanId information.

Other columns such as spIdx, acquisitionNum, and spectrum are fine

          scanWindowUpperLimit spectrum
F1.S01459                 1500     1459
F1.S01464                 1500     1464
F1.S01473                 1500     1473
F1.S01484                 1500     1484
F1.S01495                 1500     1495
F1.S01506                 1500     1506
F1.S01517                 1500     1517
F1.S01528                 1500     1528
F1.S01539                 1500     1539
F1.S01550                 1500     1550
F1.S01561                 1500     1561
F1.S01572                 1500     1572
F1.S01583                 1500     1583
F1.S01594                 1500     1594
F1.S01605                 1500     1605
F1.S01616                 1500     1616
F1.S01627                 1500     1627
F1.S01638                 1500     1638
F1.S01649                 1500     1649
F1.S01660                 1500     1660
F1.S01671                 1500     1671
F1.S01682                 1500     1682
F1.S01693                 1500     1693
F1.S01704                 1500     1704
F1.S01715                 1500     1715

This looks like a bug. Just wonder if the scanID information can be extracted from other columns if scanId can not be found.

The test data is from https://drive.google.com/drive/folders/1PRDIvihGFgkmErp2fWe41UR2Qs2VY_5G AB TripleTOF 6600 datasets

Thanks,

-Han

crmclean commented 4 years ago

Dear Han,

Sorry for not responding sooner. I've been quite busy the past few days. I'll try and take a look at this today.

All the best, Craig

crmclean commented 4 years ago

Dear Han,

Thank you for your detailed review of my code. This is the first time I have worked with wiff type files. I think the bug can be attributed to that. I added some if statements to acquire scan number information from the column "spectrum" as you pointed out above. Please run the updated version of the code from github and let me know if you were able to run it. I was not able to reproduce your error, as I could not load the files into xcms. Could you please send me a script on how you are loading data into AutoTuner? I'd like to include the functionality to handle wiff files into other parts of the software. Hopefully, the changes let you run you Autotuner despite this.

All the best, Craig

hh1985 commented 4 years ago

Hi Craig,

I converted the wiff files into mzML format through msconvert or Proteowizard. In order to do the conversion on Linux server, I use docker to do that: https://hub.docker.com/r/chambm/pwiz-skyline-i-agree-to-the-vendor-licenses

-Han

On Mon, Jul 6, 2020 at 11:10 PM crmclean notifications@github.com wrote:

Dear Han,

Thank you for your detailed review of my code. This is the first time I have worked with wiff type files. I think the bug can be attributed to that. I added some if statements to acquire scan number information from the column "spectrum" as you pointed out above. Please run the updated version of the code and let me know if you were able to run it. I was not able to reproduce your error, as I could not load the files into xcms. Could you please send me a script on how you are loading data into AutoTuner? I'd like to include the functionality to handle wiff files into other parts of the software. Hopefully, the changes let you run you Autotuner despite this.

All the best, Craig

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/crmclean/Autotuner/issues/29#issuecomment-654296860, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVZ7ZHLDBBBPWBKT4ZKGZLR2HSNBANCNFSM4ONCHYKQ .

crmclean commented 4 years ago

Ah, gotcha. One of the docs for xcms said it would accept wiff files if you loaded both the original and the scan file, but I wasn't able to figure it out in a short amount of time. If the solution does not work, would you mind sending me a few of the mzML files you generated?

hh1985 commented 4 years ago

@crmclean Absolutely not.

hh1985 commented 4 years ago

@crmclean The updated commit didn't work. I added a few lines in EICParams.R and this works for me:

        header <- suppressWarnings( MSnbase::header(msnObj))

        # -- Begin: fix the header scanId problem
        scanID <- as.numeric(sub("(.* )?scan=|(.* )?scanId=", "", header$spectrumId))
        if (any(is.na(scanID))) {
          header$spectrumId <- stringr::str_c(header$spectrumId, " scanId=", header$spectrum)
        }
        # -- End

        allMzs <- MSnbase::mz(msnObj)

BTW, I am trying to benchmarking the tuning algorithms (MetaboAnalystR 3, IPO, Autotuner) using dataset from Li et al. 2018 Comprehensive evaluation of untargeted metabolomics data processing software in feature detection, quantification and discriminating marker selection

The results are interesting: Autotuner gives more identified peaks, but quantification is not as good as others. I will dig more into it.

crmclean commented 4 years ago

Thanks for the fix! I'll update the code. Super curious, what do you mean by "quantification"?

hh1985 commented 4 years ago

@crmclean The actual log ratio vs. the expected log ratio (ground truth).

crmclean commented 4 years ago

Gotcha. Thanks for clarifying.