apLCMS is a software which generates a feature table from a batch of LC/MS spectra. The m/z and retention time tolerance levels are estimated from the data. A run-filter is used to detect peaks and remove noise. Non-parametric statistical methods are used to find-tune peak selection and grouping. After retention time correction, a feature table is generated by aligning peaks across spectra. For further information on apLCMS please refer to https://mypage.cuhk.edu.cn/academics/yutianwei/apLCMS/.
This is a fork of the official aplcms repo that takes the project towards large-scale MS analyses.
The newest version of the package can be installed through conda from bioconda channel:
conda install -c bioconda r-recetox-aplcms
Alternatively, a series of Galaxy tools is available at https://github.com/RECETOX/galaxytools/tree/master/tools/recetox_aplcms.
The tool can generate a feature table from a batch of LC/MS spectra given as a collection of mzML files. mzML is a well tested open-source format for mass spectrometer output files that can be readily utilized by the community and easily adapted for incremental advances in mass spectrometry technology.
In contrast to well-known XCMS tool, apLCMS can process profile mode data and fits a bi-Gaussian peak shape model to the data, resulting in better peak detection than XCMS. Drifts in retention time are also corrected in the tool, outputting an aligned feature table.
It operates in two modes - unsupervised
and hybrid
. Unsupervised
mode of apLCMS is not relying on any existing knowledge about metabolites or any historically detected features. On the other hand, Hybrid
version of apLCMS is incorporating the knowledge of known metabolites and historically detected features on the same machinery to help detect and quantify lower-intensity peaks. To use such knowledge, especially historical data, you must keep using the same chromatography system (otherwise the retention time will not match), and the same type of samples with similar extraction technique, such as human serum. For both modes, an equally-named function is exposed, parametrised with multiple arguments.
Before being able to run the tests, it is necessary to fetch the required data using the following commands:
wget -P tests/testdata/adjusted -i tests/remote-files/adjusted.txt
wget -P tests/testdata/aligned -i tests/remote-files/aligned.txt
wget -P tests/testdata/extracted -i tests/remote-files/extracted.txt
wget -P tests/testdata/input -i tests/remote-files/input.txt
wget -P tests/testdata/recovered -i tests/remote-files/recovered.txt
wget -P tests/testdata/recovered/recovered-extracted -i tests/remote-files/recovered-extracted.txt
wget -P tests/testdata/recovered/recovered-corrected -i tests/remote-files/recovered-corrected.txt
wget -P tests/testdata/filtered -i tests/remote-files/filtered.txt
wget -P tests/testdata/filtered/run_filter -i tests/remote-files/run_filter.txt
wget -P tests/testdata/features -i tests/remote-files/features.txt
wget -P tests/testdata/clusters -i tests/remote-files/clusters.txt
wget -P tests/testdata/hybrid -i tests/remote-files/hybrid.txt
wget -P tests/testdata/template -i tests/remote-files/template.txt
wget -P tests/testdata/unsupervised -i tests/remote-files/unsupervised.txt
The hybrid
and unsupervised
tests are reported to be OS specific and may fail depending on the platform they are run on. To ensure reproducibility during development process you can run the tests in a designated Docker container as follows:
# from the repository root run
$ docker build -t recetox-aplcms .
After docker-build
has built the image run:
$ docker run --rm -t -v $(pwd):/usr/src/recetox-aplcms recetox-aplcms
This will create a container and automatically run all the tests from the tests folder.
The development environment can be set up in two ways, either via VSCode's devcontainer extension or a docker container.
To use a devcontainer you need VSCode with Remote - Containers extension and docker installed on your machine:
Remote-Containers: Open Folder in Container
. VSCode may take a few minutes building a container;conda activate recetox-aplcms-dev
to activate Conda environment;R
or radian
to enter R terminal (we recommend radian
due to its ease of use);devtools::test()
and waiting until all tests pass to ensure the environment is set correctly.To use a docker development environment you need Docker installed on your machine. If you don't have Docker you can follow installation instructions on Docker's web.
docker build -t recetox-aplcms .
to build an image. This may take a few minutes.$ docker run -it \
-v $(pwd):/usr/src/recetox-aplcms \
--entrypoint '/bin/bash' \
recetox-aplcms
$ apt update && apt upgrade
$ apt install git && git config --global --add safe.directory /usr/src/recetox-aplcms
conda activate recetox-aplcms-dev
R
or radian
to enter R terminal (we recommend radian
due to its ease of use);devtools::test()
and waiting until all tests pass to ensure the environment is set correctly.Yu, T., Park, Y., Johnson, J. M. & Jones, D. P. apLCMS—adaptive processing of high-resolution LC/MS data. Bioinformatics 25, 1930–1936 (2009). DOI: 10.1093/bioinformatics/btp291.
Yu, T., Park, Y., Li, S. & Jones, D. P. Hybrid Feature Detection and Information Accumulation Using High-Resolution LC–MS Metabolomics Data. J. Proteome Res. 12, 1419–1427 (2013). DOI: 10.1021/pr301053d.
Yu, T. & Jones, D. P. Improving peak detection in high-resolution LC/MS metabolomics data using preexisting knowledge and machine learning approach. Bioinformatics 30, 2941–2948 (2014). DOI: 10.1093/bioinformatics/btu430.
Yu, T. & Peng, H. Quantification and deconvolution of asymmetric LC-MS peaks using the bi-Gaussian mixture model and statistical model selection. BMC Bioinformatics 11, 559 (2010). DOI: 10.1186/1471-2105-11-559.
Liu, Q. et al. Addressing the batch effect issue for LC/MS metabolomics data in data preprocessing. Sci. Rep. 10, 13856 (2020). DOI: 10.1038/s41598-020-70850-0.