Publication DOI: https://doi.org/10.1186/s13321-023-00724-w
The pipeline consists of seven interconnected steps:
1) File conversion (optional): Simply add your Thermo raw files in data/raw/
and they will be converted to centroid mzML files. If you have Agilent or Bruker files, skip that step - convert them independently using proteowizard (see https://proteowizard.sourceforge.io/) and add them to the data/mzML/
directory.
2) Pre-processing: Converting your raw data to a table of metabolic features with a series of algorithms.
3) Re-quantification: Re-quantify all raw files to avoid missing values resulted by the pre-processing workflow for statistical analysis and data exploration (optional step).
4) GNPSexport: generate all the files necessary to create a FBMN or IIMN job at GNPS.
5) Structural and formula predictions with SIRIUS and CSI:FingerID.
6) Annotations: annotate the feature tables with #1 ranked SIRIUS and CSI:FingerID predictions (MSI level 3), spectral matches from a local MGF file (MSI level 2).
7) Data integration: Integrate the #1 ranked SIRIUS and CSI:FingerID predictions to the graphml file from GNPS FBMN for visualization. Optionally, annotate the feature tables with GNPS MSMS library matching annotations (MSI level 2).
Clone this repository to your local system, into the place where you want to perform the data analysis.
(Make sure to have the right access / SSH Key. If not, follow the steps: Step 1: https://docs.github.com/en/github/authenticating-to-github/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent
git clone https://github.com/eeko-kon/pyOpenMS_UmetaFlow.git
Mono, homebrew and wget dependencies:
For Linux only(!)
Install mono with sudo:
sudo apt install mono-devel
For both systems
Install homebrew and wget:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Press enter (RETURN) to continue
For Linux only(!)
Follow the Next steps instructions to add Linuxbrew to your PATH and to your bash shell profile script, either ~/.profile on Debian/Ubuntu or ~/.bash_profile on CentOS/Fedora/RedHat (https://github.com/Linuxbrew/brew).
test -d ~/.linuxbrew && eval $(~/.linuxbrew/bin/brew shellenv) test -d /home/linuxbrew/.linuxbrew && eval $(/home/linuxbrew/.linuxbrew/bin/brew shellenv) test -r ~/.bash_profile && echo "eval \$($(brew --prefix)/bin/brew shellenv)" >>~/.bash_profile echo "eval \$($(brew --prefix)/bin/brew shellenv)" >>~/.profile
For both systems
brew install wget
pyOpenMS and other libraries: Installing pyOpenMS using conda is advised: First, create a conda environment and install the wheels and other dependencies. Then get the latest wheels and install all dependencies:
conda create --name pyopenms python=3.10 conda activate pyopenms pip install --index-url https://pypi.cs.uni-tuebingen.de/simple/ pyopenms-nightly conda install -n pyopenms ipykernel --update-deps --force-reinstall pip install pyteomics pip install --upgrade nbformat pip install matplotlib
For installation details and further documentation, see: pyOpenMS documentation.
ThermoRawFileParser
(cd resources/ThermoRawFileParser && wget https://github.com/compomics/ThermoRawFileParser/releases/download/v1.3.4/ThermoRawFileParser.zip && unzip ThermoRawFileParser.zip)
SIRIUS
Download the latest SIRIUS executable manually from here until available as a conda-forge installation. Choose the headless zipped file compatible for your operating system (linux, macOS or windows) and unzip it under the directory resources/
. Make sure to register using your university email and password.
(cd resources/ && wget https://github.com/boecker-lab/sirius/releases/download/v5.6.2/sirius-5.6.2-linux64-headless.zip && unzip *.zip)
(cd data && wget https://zenodo.org/record/6948449/files/Commercial_std_raw.zip?download=1 && unzip *.zip -d raw)
The data can be used for testing the workflow. Otherwise the user can simply transfer their own data under the directory data/raw/
or data/mzML/
.
All the results are in a .TSV format and can be opened simply with excel or using pandas dataframes.
Kontou, E.E., Walter, A., Alka, O. et al. UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis. J Cheminform 15, 52 (2023). https://doi.org/10.1186/s13321-023-00724-w
Pfeuffer J, Sachsenberg T, Alka O, et al. OpenMS – A platform for reproducible analysis of mass spectrometry data. J Biotechnol. 2017;261:142-148. doi:10.1016/j.jbiotec.2017.05.016
Dührkop K, Fleischauer M, Ludwig M, et al. SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nat Methods. 2019;16(4):299-302. doi:10.1038/s41592-019-0344-8
Dührkop K, Shen H, Meusel M, Rousu J, Böcker S. Searching molecular structure databases with tandem mass spectra using CSI:FingerID. Proc Natl Acad Sci. 2015;112(41):12580-12585. doi:10.1073/pnas.1509788112
Nothias LF, Petras D, Schmid R, et al. Feature-based molecular networking in the GNPS analysis environment. Nat Methods. 2020;17(9):905-908. doi:10.1038/s41592-020-0933-6