Optimus is a workflow for LC-MS-based untargeted metabolomics. It can be used for feature detection, quantification, filtering (e.g. removing background features), annotation, normalization and, finally, for spatial mapping of detected molecular features in 2D and 3D using the `ili app. Optimus employes the state-of-the-art LC-MS feature detection and quantification algorithms by OpenMS which are joined into a handy pipeline with a modern workflow management software KNIME with additional features implemented by us.
The workflow is being developed by Alexandrov Team at EMBL Heidelberg (contact information) in collaboration with the Dorrestein Lab at UCSD.
The workflow was initially developed for LC-MS-based metabolite cartography, but can be useful in almost any study of LC-MS-based untargeted metabolomics. Direct-infusion experimental data is also supported. Optimus is developed to be open-source, sharable, and efficient enough to process hundreds of LC-MS runs in reasonable time.
The workflow consists of the following consequtive steps:
The workflow is performed by KNIME Analytics Platform, an open-source cross-platform general-purpose workflow management system. Before you start using the workflow, you need to install KNIME itself, Python 2.7 (if it's not already installed) and a few additional modules for Python and KNIME. The installation steps are described below. If your computer is running Windows 7 or newer or macOS 10.10 or newer, you can avail express installation scripts described in the section below. Otherwise, you will need to install Optimus dependencies manually as per Regular installation section.
Go to the Releases section of this repository, download a zip archive with the latest Optimus version and unpack it to any directory on your computer. Then, follow the instruction for your OS:
Windows users: open the installer
subdirectory and double-click win_installer.cmd
. It should install KNIME and Python automatically. During the installation, you will be prompted to select KNIME installation directory via a graphical window.
macOS Users: open your Terminal, navigate to the installer
subdirectory and execute sudo bash mac_installer.sh
.
All: After the installation has finished, make sure that the installed Python distribution is recognized correctly by KNIME:
File
menu and select Preferences
. A Preferences
dialog should appear.KNIME
item and select its Python
sub-item.Apply
.Browse...
and select file
C:\Users\<user name>\AppData\Local\OptimusAnaconda\python
/Users/<user name>/OptimusMiniconda/bin/python
OK
. If no errors are displayed, you can proceed to Installing and updating workflow section.python -V
in your command prompt. You expect to see the output line starting with "Python 2.7". The second part of the version check is determining a bit version of your python interpreter. Follow this instruction to know whether you have a 64-bit Python or not.Path
environment variable. That's why you might get an error message upon executing python
in command prompt although it's installed. To fix this, you should add <Python_installation_directory
> to Path
as well as <Python_installation_directory\Scripts
>. By default, these directories are C:\Python27
and C:\Python27\Scripts
. You can find an instruction on changing Path
variable here.pip install six pandas protobuf pymspec pyopenms
pip
package manager available on your workstation. If you don't, execute sudo easy_install pip
in the terminal to install it.sudo pip install six pandas protobuf pymspec pyopenms
File => Install KNIME Extensions...
. Available software
dialog should open after this.OpenMS
KNIME Python Integration
KNIME Quick Forms (legacy)
KNIME Virtual Nodes
KNIME JavaScript Views
Note, that the procedure described above should be completed only once. So, if you get a new version of the workflow in the future, all you'll have to do is just open it with KNIME. As soon as the steps above are accomplished, your environment is ready to run the workflow.
Possible Python compatibility issues: If you have several Python installations in your system, please make sure that KNIME detected the correct one. To do this, go to File => Preferences
, then type "python" in the filter box. You should see two items at the left-hand side of the dialog: KNIME > Python
. Click at Python
and check that there're no error messages appear. If there're any, press Browse...
and navigate to the python executable that was called when installing modules at the 3rd step. If you followed the instructions above precisely, you can get a path to the needed python executable by executing which python
in Linux/macOS terminal or where python
in Windows command prompt.
It's assumed that you've downloaded the latest Optimus release and extracted it to a directory on your computer.
File => Import KNIME Workflow...
. Workflow Import Selection
dialog should open after this.File
, press Browse...
and select the Optimus v...knwf
file from the extracted directory.Finish
.Now, you should see the Optimus v...
item in the list at the left-hand side of the KNIME window. Double-click on it to open the workflow in the Workflow Editor where you can change its settings and specify input/output files.
New versions of the workflow appear as new releases in this repository. In order to update the workflow on your local computer to a newer version, repeat the steps above.
The workflow supports mzML and mzXML formats of mass spectrometry data. Internally, all input files will be converted to mzML, so you can save some time on the workflow execution if your data are already in this format. Make sure that input files contain centroided data.
The policy of KNIME input nodes implies they always have some files selected. However, it doesn't always match use-cases Optimus can handle, e.g. you might not have a list of internal standards spiked in your samples. To bypass this restriction and make the workflow run, you need to create a file called "stub.txt" anywhere on your computer and use it as an input file whenever you don't have files required by an input node.
The first stage of Optimus execution is the creation of a file with some details concerning your experimental design such as blank runs, replicate runs, etc. This information can be used by Optimus during the data analysis to remove features caused by background signals or those that are not reproducible in replicate runs. The experimental design is a CSV spreadsheet consisting of 4 columns: file path, LC-run type, sample group and replicate group. Optimus will generate a template of the spreadsheet with all file paths filled, but other columns should be filled manually according to your study as described below:
BLANK
in rows corresponding to blank LC-runs, POOLED_QC
for pooled QC runs.Read LC-MS runs
node and select Configure...
. A dialog for input file selection should show up.Clear
, then press Add
and select files with your samples and press OK
.Read group mapping
, Read list of internal standards
and Read annotation source
.Where to save template of experimental design
and select Configure...
. A dialog for output directory selection should show up.Browse...
and specify where a template file with your experimental design will be stored. It's recommended to keep it in the same directory where you LC-MS data is located. Click OK
.Generate template of experimental design
node and select Execute
. The upper part of the workflow should start execution, and the file with experimental design will be created. The file can be then edited manually according to your experimental design as described above.Read experimental design
node and select Configure...
. A file selection dialog should appear.Browse...
and select the experimental design file.read column headers
is checked, read row IDs
is not checked, and column delimiter
is set to ,
(comma). Click OK
.Display feature heat map
node and select Execute
. The workflow should start execution. It's finished when a red circle in the lower part of the node turns into a green one. Wait till it happens.Display feature heat map
node and select Interactive View: Generic JavaScript View
. A window showing distribution of detected features will show up.
Scale
at the left-bottom corner of the window.In order to save results produced by Optimus, open the configuration dialog of the Save results
node and specify an output directory. Once the node is executed, it creates 3 files in the output folder. One of them, features_quantification_matrix.csv
can be opened in any spreadsheet editor (e.g. Excel). Rows in the table will correspond to input samples, whereas columns will represent consensus features, i.e. ions of the same type quantified across the runs. Table cells contain intensities of corresponding features. Names of columns give information on corresponding features. The format is mz_value RT charge (ID: numeric_identifier)
, so for example a column named 233.112 69 1 (ID: 123)
represents a single-charged ion with mz-value about 233.112 and chromatographic peak at around 69 seconds. As reported features are consensus, it doesn't mean one can find a mass trace in input samples matching consensus mz-value and retention time exactly. These figures are averaged across the runs, but the variation is supposed to be low from run to run.
Numeric identifiers (IDs) are assigned to features after the alignment step and are not changed at the further steps. For the same input dataset and fixed parameters of feature detection and alignment, association between IDs and features are guaranteed to remain the same. So, the IDs can be used as shortcuts for features.
Another file produced by the workflow, Optimus_settings.ini
is a configuration file that contains the list of values of all Optimus parameters used to generate the output. This file, along with the experimental design, can be used to reproduce the data analysis.
The third OptimusViewer_input.db
file contains extracted ion chromatograms (XIC) and MS/MS spectra for detected features. The file can be opened with OptimusViewer
application also developed by Alexandrov team. You can download it and find the instruction on usage in this GitHub repository.
If you're new to workflow management systems or KNIME in particular, you can find an introductory tutorial on basic features of KNIME here.
This repository contains real-life samples that you can test the workflow on. They're available in this archive (courtesy of Alexey Melnik, Dorrestein Lab, UCSD). Inside, you'll find a directory called samples
that contains LC-MS samples in mzXML format ready to be processed with the workflow. Blank samples separated from the normal ones in the blanks
directory inside samples
. They can be used to remove background features from your result features set.
There're also 2 files in the root folder called coords.csv
and Rotten_Apple_Model.stl
. You'll need to supply the former one at the last step of the workflow that is supposed to produce spatial maps for `ili.
If you want to check quickly, what are actually the results of the workflow, without diving into KNIME and installing everything, you can find the needed file in the results
folder in the archive. It contains file features_mapping.csv
which is a spreadsheet containing a table with intensities of different features detected in different runs. This file can be visualized in `ili along with Rotten_Apple_Model.stl
. You can simply drag&drop both of them to the `ili window.
Below, you can find an example of a spatial map obtained from `ili for a feature that is localized mainly in the vicinity of rot on the apple.
The workflow has many capabilities that you can discover in the documentation embedded into it. Click on any node, and the description of its role and its parameters will show up in the banner at the right-hand side of the KNIME window. Different nodes don't depend on each other, so you can experiment with different settings and track changes of the workflow output.
Some errors can appear in the application log that interrupt workflow execution. A node caused an error will be the left-most node with a red circle in its right side. In addition, you should see error output in KNIME Console. Below, you can find solutions for some common issues.
Error output or problem | Reason | Solution |
---|---|---|
ValueError: Only one sample labeled as "Replicate group (user-defined)" replicate is found. Please either remove this label or mark other samples with it. |
Incorrect settings of the node reading experimental design file. | In the configuration dialog of Read experimental design node, check read column headers flag. |
ValueError: No internal standard matched detected features. Consider changing settings of feature detection algorithm. |
No features matching a provided list of internal standards are found. | Either change m/z and/or RT values of your internal standards in the CSV file provided to Optimus, or change settings of the Detect LC-MS features node to detect more features, potentially, ones corresponding to your internal standards. |
A computer runs out of hard drive space when Optimus is running | Temporary files produced by Optimus are too large. | Cancel the workflow execution. Either free up some space or use space from another hard disk drive for temporary files as follows. Make sure an additional hard drive is connected. Open KNIME preferences dialog and in the `KNIME` section set `Directory for temporary files` to be located in a hard drive with more free space available. Restart KNIME to apply the new settings. |
ValueError: Input list of LC-MS features is empty. Try to change settings of feature detection or your filters. |
No LC-MS features were reported at the end of the workflow. | Try to set more permissive settings of the Filter features node and/or the Detect LC-MS features one. Another option to consider would be removing a list of ions of interest if you used it for feature annotation by mz-RT matching. |
ValueError: Samples without any group reference are found in the experimental design, though groups exist. Please either remove group names completely or assign a group to each LC-run. |
Samples without a study group reference are found. | Include each sample to at least one study group or remove all study groups from the experimental design file. |
ValueError: Study groups and replicate groups must not have same names. Following duplicate(s) have been found |
Study groups and replicate identifiers having same names are found in the experimental design file. | Rename your study groups and/or replicate samples identifiers so that they do not overlap. |
ValueError: Input file names must have different base names (names without extensions). |
Some input files have duplicated base names. | Rename input files with duplicated names and generate the experimental design file again. |
When executing the Clean up temporary files node, message WindowsError: [Error 3] The system cannot find the path specified . |
Internal Windows-specific Python issue when accessing file system. | Execute the node again, the error should not appear. |
ERROR Output Folder Execute failed: Cannot write to containing directoy |
The directory specified for saving workflow output does not exist. | Create the directory specified for saving workflow output. |
ERROR FeatureFinderMetabo Error: Unexpected internal error (The value '0 0' was used but is not valid! FWHM beginning/ending indices not computed? Aborting...)] |
Internal error of the LC-MS feature detection algorithms from the OpenMS library | Double-click on Detect LC-MS features node. Its internal structure should appear. Right-click on Set advanced FD settings and select Configure... in the drop-down menu. A configuration dialog should appear. Select fixed for epd_width_filtering parameter and click OK . Close the current KNIME tab to return to the top-level workflow view. Execute the workflow again. |
ERROR PythonKernel determination of memory status not supported on this platform, mesauring for memoryleaks will never fail |
Mac-specific message prompted by the pyopenms library. It is not an error, but a diagnostic message. It does not affect workflow results or performance. |
Ignore. |
Execute failed: Not all chunks finished - check individual chunk branches for details |
Internal KNIME issue. | Right-click on the node caused the error and select Reset in the drop-down menu. Execute the workflow again. The error should not appear. |
Execute failed: java.lang.NullPointerException |
||
Execute failed: ConcurrentModificationException |
||
Execute failed: Could not start python kernel |
||
ERROR FileConverter Execute failed: Failed to execute node FileConverter |
||
ERROR LoadWorkflowRunnable Errors during load: Status: DataLoadError: Optimus_v_1.0 0 loaded with error during data load |
Reset the workflow: right-click on the workflow item in KNIME Explorer and select Reset in the drop-down menu. Then, execute it again. The error should not appear again. |
The content of this project is licensed under the Apache 2.0 licence, see LICENSE.md.