Adapt for mzxml format - Githubissues

KujawinskiLaboratory / Autotuner

This repo contains the code needed to run the R package Autotuner. Autotuner is used to identify proper parameters during metabolomics data processing.

MIT License

16 stars 8 forks source link

Adapt for mzxml format #17

Closed yufree closed 5 years ago

yufree commented 5 years ago

https://github.com/crmclean/Autotuner/blob/d96d97649eee3ac83657e614c5dc5333ba5f7463/R/dissectScans.R#L24

https://github.com/crmclean/Autotuner/blob/19a4f457fae3a05d19d19b98eff7cb902d9be233/R/findPeakWidth.R#L48

https://github.com/crmclean/Autotuner/blob/b4aa8b540fa8d721a3424662f90663c454821de6/R/checkBounds.R#L79

If the input file is mzxml, this part should be scanId. I suggest to use the following code:

scanID <- as.numeric(sub("(.* )?scan=|(.* )?scanId=", "", peakHead$spectrumId[ms1]))

I could PR if you feel OK.

crmclean commented 5 years ago

Dear @yufree

Thank you so much for your helpful suggestion. I very much appreciate you identifying regions where I could improve my code. I am not too familiar with how a pull request might impact unit testing, and I have quite a bit on my plate at the moment... I went ahead and implemented the changes directly. I am very grateful for your generous offer to help me out though. The next push to this repo should contain your suggested changes, and once that passes the Travis build, will get pushed into the Bioconductor version of the package.

Thank you once again, Craig

crmclean commented 5 years ago

Hey @yufree. Just wanted to pass on a bit of info. We just submitted the manuscript for AutoTuner to bioRxiv. You can find it here if you are interested:

https://www.biorxiv.org/content/10.1101/812370v1

yufree commented 5 years ago

Thanks! I am waiting for this paper! Great works!

yufree commented 5 years ago

Hi Craig, This package works fine on your data. However, it seems current version is still return error. This part might be the reason. https://github.com/crmclean/Autotuner/blob/09355f03b2ecc244cbe1bd8fc68a4b2887e66b9b/R/findPeakWidth.R#L169 Even when I fixed this part, I still see error:

Currently on sample 1
--- Currently on peak: 1
Error in if (maxPw < 5 * minPw) { : argument is of length zero

I checked and found maxPw could be NA. In this case, if would return above error. I wonder if you could fix this issue? I have some mzXML files here which you could use for test.

Thanks!

Miao

crmclean commented 5 years ago

Dear Miao,

Thanks again for giving me another opportunity to improve my code. Seems like you found a bug.

I am not able to access the files in the link you provided. Could you please send them to me to my email at crmclean@mit.edu through dropbox, google docs, or through the open science framework?

I will be happy to look into this in the next few days.

Craig

yufree commented 5 years ago

Hi Craig,

You're welcome! I really hope this package could replace IPO since it took too much time.

That data could actually be accessed via rmwf package. Then you could find the files via those codes:

remotes::install_github('yufree/rmwf')
path <- system.file("extdata/data", package = "rmwf")
file.copy(list.files(path, full.names=T,recursive=T),'~')

Then you could find the mzXML files in your home directory. There are 11 samples with 6 matrix samples and 5 NIST 1950 serum samples. I think the 5 serum sample would be suitable for the test.

After the copy, you could remove this package to release the disk space:

remove.packages('rmwf')

BTW, I knew mzML was actually a better format than mzXML :)

Thanks,

Miao

crmclean commented 5 years ago

Good Morning Miao

I tried to download your R package at work and home the past few days through your recommended path. Unfortunately, I was not able to complete the download...

I also tried going to your github directory to get the files from the inst/extdata/data path, and that only gave me partial files. Is there another way you could share these files with me?

Craig

yufree commented 5 years ago

Hi Craig,

I just uploaded the samples here: https://doi.org/10.6084/m9.figshare.7684046.v1 Let me know if it works.

Thanks,

Miao

crmclean commented 5 years ago

Thank you, Miao. I downloaded the files! One final question, are the files themselves replicates of a standard mix of compounds in serum? Or are they different from one another?

Craig

yufree commented 5 years ago

Hi Craig,

Those files are from the same pooled standard reference material NIST 1950(Metabolites in Human Plasma) and you could treat them as technique replicates. Sorry, I always mixed plasma with serum...

I used this sample as demo data since such SRM's metabolite profile has been reported and easy to purchase and compare across different labs.

Thanks,

Miao

crmclean commented 5 years ago

Great! Thank you for passing this along, Miao. Just asking to cover the metadata info required by the algorithm. Also, I had not previously been able to find info on the data when I googled them, so this paper is really helpful!

Craig

crmclean commented 5 years ago

Hey Miao,

I think I've fixed the bug! Turns out your data was a victim to a corner case I previously implemented a solution for but had not previously been able to test. So again, thanks for helping me improve this code! I was able to run AutoTuner and get parameters for the five files you shared with me using the most recent version of this repo. Hopefully, you will too.

Craig

yufree commented 5 years ago

Thanks a lot and I just tested and it works!

Miao