RECETOX / recetox-aplcms

This is a custom fork of apLCMS containing adaptations for large scale studies.
GNU General Public License v2.0
4 stars 8 forks source link
hacktoberfest lcms mass-spectrometry peak-detection

recetox-aplcms

GitHub R package version Conda codecov

apLCMS is a software which generates a feature table from a batch of LC/MS spectra. The m/z and retention time tolerance levels are estimated from the data. A run-filter is used to detect peaks and remove noise. Non-parametric statistical methods are used to find-tune peak selection and grouping. After retention time correction, a feature table is generated by aligning peaks across spectra. For further information on apLCMS please refer to https://mypage.cuhk.edu.cn/academics/yutianwei/apLCMS/.

This is a fork of the official aplcms repo that takes the project towards large-scale MS analyses.

Installation

The newest version of the package can be installed through conda from bioconda channel:

conda install -c bioconda r-recetox-aplcms

Alternatively, a series of Galaxy tools is available at https://github.com/RECETOX/galaxytools/tree/master/tools/recetox_aplcms.

Usage

The tool can generate a feature table from a batch of LC/MS spectra given as a collection of mzML files. mzML is a well tested open-source format for mass spectrometer output files that can be readily utilized by the community and easily adapted for incremental advances in mass spectrometry technology.

In contrast to well-known XCMS tool, apLCMS can process profile mode data and fits a bi-Gaussian peak shape model to the data, resulting in better peak detection than XCMS. Drifts in retention time are also corrected in the tool, outputting an aligned feature table.

It operates in two modes - unsupervised and hybrid. Unsupervised mode of apLCMS is not relying on any existing knowledge about metabolites or any historically detected features. On the other hand, Hybrid version of apLCMS is incorporating the knowledge of known metabolites and historically detected features on the same machinery to help detect and quantify lower-intensity peaks. To use such knowledge, especially historical data, you must keep using the same chromatography system (otherwise the retention time will not match), and the same type of samples with similar extraction technique, such as human serum. For both modes, an equally-named function is exposed, parametrised with multiple arguments.

Testing

Before being able to run the tests, it is necessary to fetch the required data using the following commands:

wget -P tests/testdata/adjusted -i tests/remote-files/adjusted.txt
wget -P tests/testdata/aligned -i tests/remote-files/aligned.txt
wget -P tests/testdata/extracted -i tests/remote-files/extracted.txt
wget -P tests/testdata/input -i tests/remote-files/input.txt
wget -P tests/testdata/recovered -i tests/remote-files/recovered.txt
wget -P tests/testdata/recovered/recovered-extracted -i tests/remote-files/recovered-extracted.txt
wget -P tests/testdata/recovered/recovered-corrected -i tests/remote-files/recovered-corrected.txt
wget -P tests/testdata/filtered -i tests/remote-files/filtered.txt
wget -P tests/testdata/filtered/run_filter -i tests/remote-files/run_filter.txt
wget -P tests/testdata/features -i tests/remote-files/features.txt
wget -P tests/testdata/clusters -i tests/remote-files/clusters.txt
wget -P tests/testdata/hybrid -i tests/remote-files/hybrid.txt
wget -P tests/testdata/template -i tests/remote-files/template.txt
wget -P tests/testdata/unsupervised -i tests/remote-files/unsupervised.txt

The hybrid and unsupervised tests are reported to be OS specific and may fail depending on the platform they are run on. To ensure reproducibility during development process you can run the tests in a designated Docker container as follows:

# from the repository root run
$ docker build -t recetox-aplcms .

After docker-build has built the image run:

$ docker run --rm -t -v $(pwd):/usr/src/recetox-aplcms recetox-aplcms

This will create a container and automatically run all the tests from the tests folder.

Documentation for developers

Setting up a development environment

The development environment can be set up in two ways, either via VSCode's devcontainer extension or a docker container.

Devcontainer

To use a devcontainer you need VSCode with Remote - Containers extension and docker installed on your machine:

Docker container

To use a docker development environment you need Docker installed on your machine. If you don't have Docker you can follow installation instructions on Docker's web.

References

Yu, T., Park, Y., Johnson, J. M. & Jones, D. P. apLCMS—adaptive processing of high-resolution LC/MS data. Bioinformatics 25, 1930–1936 (2009). DOI: 10.1093/bioinformatics/btp291.

Yu, T., Park, Y., Li, S. & Jones, D. P. Hybrid Feature Detection and Information Accumulation Using High-Resolution LC–MS Metabolomics Data. J. Proteome Res. 12, 1419–1427 (2013). DOI: 10.1021/pr301053d.

Yu, T. & Jones, D. P. Improving peak detection in high-resolution LC/MS metabolomics data using preexisting knowledge and machine learning approach. Bioinformatics 30, 2941–2948 (2014). DOI: 10.1093/bioinformatics/btu430.

Yu, T. & Peng, H. Quantification and deconvolution of asymmetric LC-MS peaks using the bi-Gaussian mixture model and statistical model selection. BMC Bioinformatics 11, 559 (2010). DOI: 10.1186/1471-2105-11-559.

Liu, Q. et al. Addressing the batch effect issue for LC/MS metabolomics data in data preprocessing. Sci. Rep. 10, 13856 (2020). DOI: 10.1038/s41598-020-70850-0.