KujawinskiLaboratory / Autotuner

This repo contains the code needed to run the R package Autotuner. Autotuner is used to identify proper parameters during metabolomics data processing.
MIT License
16 stars 8 forks source link

Can't read mzXML or mzML #19

Closed stolltho closed 4 years ago

stolltho commented 4 years ago

Hi Craig I am using msconvert (default settings) to convert Agilent QTOF data to either mzXML or mzML. Both formats give the following error when executing EICparams()

Currently on sample 1 --- Currently on peak: 1 Error: There was a problem finding spectrum IDs within header file for this data. Error occured after function 'dissectScans'. In addition: Warning message: In dissectScans(mzDb, observedPeak = observedPeak, header = header) : NAs introduced by coercion

Cheers, Thomas

crmclean commented 4 years ago

Dear Thomas,

I had a similar issue when I first tried some orbitrap samples. I'm wondering if it might be something related to the file structure of the data coming from your instrument. Would you be ok with sharing a few files with me and some metadata? I would be happy to spend some time debugging.

All the best, Craig

stolltho commented 4 years ago

Sure Craig. I can send you a download link for 2 mzXML files (320mb total) if you want to provide an email address. Cheers, Thomas

On Fri, Apr 3, 2020 at 1:40 PM crmclean notifications@github.com wrote:

Dear Thomas,

I had a similar issue when I first tried some orbitrap samples. I'm wondering if it might be something related to the file structure of the data coming from your instrument. Would you be ok with sharing a few files with me and some metadata? I would be happy to spend some time debugging.

All the best, Craig

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/crmclean/Autotuner/issues/19#issuecomment-608212125, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2PB334B5W454ETJQXSDUDRKVLBBANCNFSM4L3F6ASQ .

crmclean commented 4 years ago

Dear Thomas,

Fantastic. Please send them to crmclean@mit.edu.

All the best, Craig

stolltho commented 4 years ago

Hi Craig

You should have received a link to download the files (from owncloud or qimr).

p-w to download: Autotuner

Cheers,

Thomas

From: crmclean notifications@github.com Sent: Saturday, 4 April 2020 12:10 AM To: crmclean/Autotuner Autotuner@noreply.github.com Cc: stolltho stolltho@gmail.com; Author author@noreply.github.com Subject: Re: [crmclean/Autotuner] Can't read mzXML or mzML (#19)

Dear Thomas,

Fantastic. Please send them to crmclean@mit.edu mailto:crmclean@mit.edu .

All the best, Craig

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/crmclean/Autotuner/issues/19#issuecomment-608454433 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2PB356TSFOUGHOMENHQY3RKXU2HANCNFSM4L3F6ASQ . https://github.com/notifications/beacon/AD2PB36EMKEOWODMEMLL67TRKXU2HA5CNFSM4L3F6AS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOERCEOII.gif

crmclean commented 4 years ago

Dear Thomas,

I haven't received an email yet. Were the files sent?

All the best, Craig

stolltho commented 4 years ago

Hi Craig

I sent a link last Friday, I just re-sent it now to crmclean@mit.edu Alternatively, use this url, p-w 'Autotuner' https://qdocs.qimrberghofer.edu.au/owncloud/index.php/s/wsCQrjkZt5Eifpe

There are 3 files now: Agilent QTOF raw file as .zip, mzXML and mzML files via msconvert.

Hope either works, best Thomas

On Sat, Apr 4, 2020 at 12:09 AM crmclean notifications@github.com wrote:

Dear Thomas,

Fantastic. Please send them to crmclean@mit.edu.

All the best, Craig

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/crmclean/Autotuner/issues/19#issuecomment-608454433, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2PB356TSFOUGHOMENHQY3RKXU2HANCNFSM4L3F6ASQ .

stolltho commented 4 years ago

Hi Craig, the email sent to you should look like this: -Thomas

[image: image.png]

On Tue, Apr 7, 2020 at 12:33 AM crmclean notifications@github.com wrote:

Dear Thomas,

I haven't received an email yet. Were the files sent?

All the best, Craig

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/crmclean/Autotuner/issues/19#issuecomment-609832175, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2PB32UBRX52IB2M2B3WE3RLHR4BANCNFSM4L3F6ASQ .

crmclean commented 4 years ago

Thanks, Thomas! I'll do my best to figure out the bug soon.

crmclean commented 4 years ago

Dear Thomas,

I apologize for troubling you. I should have been more clear when I asked you to send me some data. I need two more distinct samples to run the peak selection part of the algorithm. Could you please send me these data? Either mzML or mzXML format will do. Also, please do let me know how the samples relate to one another. Like if they're replicates or part of another experimental factor. It would help me with the peak selection step.

All the best, Craig

stolltho commented 4 years ago

Hi Craig No problem at all. Here are some more files https://qdocs.qimrberghofer.edu.au/owncloud/index.php/s/VowjFsOkZxOr0JH They are all replicate QC injections within the same analysis sequence. Cheers, Thomas

On Wed, Apr 8, 2020 at 1:12 PM crmclean notifications@github.com wrote:

Dear Thomas,

I apologize for troubling you. I should have been more clear when I asked you to send me some data. I need two more distinct samples to run the peak selection part of the algorithm. Could you please send me these data? Either mzML or mzXML format will do. Also, please do let me know how the samples relate to one another. Like if they're replicates or part of another experimental factor. It would help me with the peak selection step.

All the best, Craig

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/crmclean/Autotuner/issues/19#issuecomment-610727603, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2PB37KKDXRVEVMED5SZF3RLPTRFANCNFSM4L3F6ASQ .

crmclean commented 4 years ago

Dear Thomas,

Thanks again for sharing your data with me. I was able to run your samples no problem. I did notice that the DKE plots were a bit hard to interpret, so I updated the code involved in the plots. What is your email? I'm happy to share the parameter estimates along with the plots with you.

Where did you download the package? I'm wondering if maybe one of the other repositories isn't up to date.

All the best, Craig

stolltho commented 4 years ago

Hi Craig I am running the following script copied from https://www.bioconductor.org/packages/release/bioc/vignettes/Autotuner/inst/doc/Autotuner.R Autotuner installation from Bioconductor. It all runs fine until "Part 2 - Parameter Extraction from Individual Extracted Ion Chromatograms" executing eicParamEsts <- EICparams(). I am getting the following error: Currently on sample 1 --- Currently on peak: 1 Error: There was a problem finding spectrum IDs within header file for this data. Error occured after function 'dissectScans'. In addition: Warning message: In dissectScans(mzDb, observedPeak = observedPeak, header = header) : NAs introduced by coercion

Cheers, Thomas

--------------------------------------------

Installation

--------------------------------------------

if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("Autotuner") BiocManager::install("MSnbase")

--------------------------------------------

Load libraries

--------------------------------------------

library(Autotuner) library(mtbls2)

--------------------------------------------

Input - raw data

--------------------------------------------

setwd("C:/R_WorkingDirectory/XCMS") #set WD

path_to_raw_data <- "C:/R_WorkingDirectory/XCMS/Autotuner" rawPaths <- list.files(path_to_raw_data, recursive = TRUE, full.names = TRUE)

print(basename(rawPaths)) print(rawPaths)

---------------------------------

Input - Metadata

---------------------------------

metadata file in WD

metadata <- read.table("md.txt", header = TRUE, stringsAsFactors = FALSE)

subset metadata for files to run Autotuner on

metadata <- metadata[metadata$Raw.Spectral.Data.File %in% basename(rawPaths),]

print(metadata)

---------------------------------

Creating Autotuner Object

---------------------------------

Autotuner <- createAutotuner(rawPaths,metadata, file_col = "Raw.Spectral.Data.File", factorCol = "Factor.Value.genotype.")

-----------------------------------------------

Part 1 -Total Ion Current Peak Identification

-----------------------------------------------

Sliding window analysis

Lag - The number of chromatographic scan points used to test if next point

is significant (ie the size number of points making up the moving average).

Threshold - A numerical constant representing how many times greater the

intensity of an adjacent scan has to be from the scans in the sliding window to be considered significant.

Influence - A numerical factor used to scale the magnitude of a

significant scan once it has been added to the sliding window. lag <- 30 #was 25 threshold<- 3.1 influence <- 0.1 signals <- lapply(getAutoIntensity(Autotuner), ThresholdingAlgo, lag, threshold, influence)

The output of the sliding window can be displayed with the plot_signals

function: plot_signals(Autotuner, threshold,

index for which data files should be displayed

         sample_index = 1:2,
         signals = signals)

rm(lag, influence, threshold)

Interpreting Sliding Window Results

Autotuner <- isolatePeaks(Autotuner = Autotuner, returned_peaks = 10, signals = signals)

Checking Peak Estimates

for(i in 1:5) { plot_peaks(Autotuner = Autotuner, boundary = 100, peak = i) }

---------------------------------------------------------------------------

Part 2 - Parameter Extraction from Individual Extracted Ion Chromatograms

---------------------------------------------------------------------------

error with peak width estimation

idea - filter things by mass. smaller masses are more likely to be

random assosications

eicParamEsts <- EICparams(Autotuner = Autotuner, massThresh = .005, verbose = FALSE, returnPpmPlots = FALSE, useGap = TRUE)

---------------------------------

Part 3 - Returning estimates

---------------------------------

returnParams(eicParamEsts, Autotuner)

Running AutoTuner is now complete, and the estimates may be entered

directly into XCMS to processes raw untargeted metabolomics data.

sessionInfo() R version 3.6.3 (2020-02-29) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363)

Matrix products: default

Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding

locale: [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252 [4] LC_NUMERIC=C LC_TIME=English_Australia.1252

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] MSnbase_2.12.0 ProtGenerics_1.18.0 S4Vectors_0.24.3 mzR_2.20.0 Rcpp_1.0.4 Biobase_2.46.0 [7] BiocGenerics_0.32.0 mtbls2_1.16.0 Autotuner_1.0.1 MSstatsQC_2.4.0

loaded via a namespace (and not attached): [1] backports_1.1.5 Hmisc_4.4-0 fastmatch_1.1-0 plyr_1.8.6 igraph_1.2.5 [6] RecordLinkage_0.4-12 lazyeval_0.2.2 CAMERA_1.42.0 splines_3.6.3 BiocParallel_1.20.1 [11] crosstalk_1.1.0.1 usethis_1.6.0 ggplot2_3.3.0 digest_0.6.25 foreach_1.5.0 [16] htmltools_0.4.0 fansi_0.4.1 magrittr_1.5 checkmate_2.0.0 memoise_1.1.0 [21] cluster_2.1.0 doParallel_1.0.15 remotes_2.1.1 limma_3.42.2 Nozzle.R1_1.1-1 [26] prettyunits_1.1.1 jpeg_0.1-8.1 colorspace_1.4-1 blob_1.2.1 IPO_1.12.0 [31] xfun_0.13 dplyr_0.8.5 callr_3.4.3 crayon_1.3.4 jsonlite_1.6.1 [36] graph_1.64.0 ffbase_0.12.8 impute_1.60.0 survival_3.1-11 iterators_1.0.12 [41] glue_1.4.0 gtable_0.3.0 ipred_0.9-9 zlibbioc_1.32.0 pkgbuild_1.0.6 [46] evd_2.3-3 DEoptimR_1.0-8 scales_1.1.0 vsn_3.54.0 DBI_1.1.0 [51] miniUI_0.1.1.1 viridisLite_0.3.0 xtable_1.8-4 htmlTable_1.13.3 foreign_0.8-75 [56] bit_1.1-15.2 preprocessCore_1.48.0 Formula_1.2-3 lava_1.6.7 prodlim_2019.11.13 [61] htmlwidgets_1.5.1 httr_1.4.1 RColorBrewer_1.1-2 acepack_1.4.1 ellipsis_0.3.0 [66] ff_2.2-14 pkgconfig_2.0.3 XML_3.99-0.3 farver_2.0.3 nnet_7.3-12 [71] labeling_0.3 tidyselect_1.0.0 rlang_0.4.5 later_1.0.0 munsell_0.5.0 [76] tools_3.6.3 cli_2.0.2 RSQLite_2.2.0 devtools_2.3.0 stringr_1.4.0 [81] fastmap_1.0.1 mzID_1.24.0 yaml_2.2.1 fs_1.4.1 processx_3.4.2 [86] knitr_1.28 bit64_0.9-7 pander_0.6.3 robustbase_0.93-6 purrr_0.3.3 [91] RANN_2.6.1 ncdf4_1.17 nlme_3.1-144 RBGL_1.62.1 mime_0.9 [96] ggExtra_0.9 compiler_3.6.3 rstudioapi_0.11 plotly_4.9.2.1 png_0.1-7 [101] testthat_2.3.2 e1071_1.7-3 affyio_1.56.0 MassSpecWavelet_1.52.0 tibble_3.0.0 [106] stringi_1.4.6 ps_1.3.2 desc_1.2.0 lattice_0.20-38 Matrix_1.2-18 [111] multtest_2.42.0 vctrs_0.2.4 pillar_1.4.3 lifecycle_0.2.0 BiocManager_1.30.10 [116] MALDIquant_1.19.3 data.table_1.12.8 qcmetrics_1.24.0 httpuv_1.5.2 xcms_3.8.2 [121] R6_2.4.1 latticeExtra_0.6-29 pcaMethods_1.78.0 affy_1.64.0 promises_1.1.0 [126] gridExtra_2.3 IRanges_2.20.2 sessioninfo_1.1.1 ada_2.0-5 codetools_0.2-16 [131] pkgload_1.0.2 MASS_7.3-51.5 assertthat_0.2.1 rprojroot_1.3-2 withr_2.1.2 [136] mgcv_1.8-31 grid_3.6.3 rpart_4.1-15 tidyr_1.0.2 class_7.3-15 [141] rsm_2.10 shiny_1.4.0.2 base64enc_0.1-3

On Sun, Apr 12, 2020 at 8:01 AM crmclean notifications@github.com wrote:

Dear Thomas,

Thanks again for sharing your data with me. I was able to run your samples no problem. I did notice that the DKE plots were a bit hard to interpret, so I updated the code involved in the plots. I'll share my results with you via email.

Where did you download the package? I'm wondering if maybe one of the other repositories isn't up to date.

All the best, Craig

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/crmclean/Autotuner/issues/19#issuecomment-612522285, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2PB33CMUFB5K7YD3HWI7TRMDSFJANCNFSM4L3F6ASQ .

crmclean commented 4 years ago

Dear Thomas,

I think I understand what the issue is now. Try to download AutoTuner from the development branch rather than the release one. You can find it here:

https://bioconductor.org/packages/devel/bioc/html/Autotuner.html

I am still learning the ins and outs of Bioconductor, so I apologize for this inconvenience. I'll try and figure out how to update that branch.

stolltho commented 4 years ago

Thanks Craig. Autotuner completed now w/o errors. Cheers, Thomas

On Wed, Apr 15, 2020 at 9:51 AM crmclean notifications@github.com wrote:

Dear Thomas,

I think I understand what the issue is now. Try to download AutoTuner from the development branch rather than the release one. You can find it here:

https://bioconductor.org/packages/devel/bioc/html/Autotuner.html

I am still learning the ins and outs of Bioconductor, so I apologize for this inconvenience. I'll try and figure out how to update that branch.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/crmclean/Autotuner/issues/19#issuecomment-613737042, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2PB36CSQ7IKTKMNUJOTQLRMTZJRANCNFSM4L3F6ASQ .

crmclean commented 4 years ago

Fantastic! I'm going to close this out now and make a new issue to change the release version.