A tool to build a tree of mass-spectrometry (LC-MS/MS) features to perform chemically-informed comparison of untargeted metabolomic profiles. The manuscript describing q2-qemistree is available here.
Once QIIME 2 is installed, activate your QIIME 2 environment and install q2-qemistree following the steps below:
git clone https://github.com/biocore/q2-qemistree.git
cd q2-qemistree
pip install .
qiime dev refresh-cache
q2-qemistree uses SIRIUS, a software-framework developed for de-novo identification of metabolites. We use molecular substructures predicted by SIRIUS to build a hierarchy of the MS1 features in a dataset. For this demo, please download and unzip the latest version of SIRIUS from here.
Below, we download SIRIUS for macOS as follows (for linux the only thing that changes is the URL from which the binary is downloaded):
wget https://bio.informatik.uni-jena.de/repository/dist-release-local/de/unijena/bioinf/ms/sirius/4.9.3/sirius-4.9.3-osx64-headless.zip
unzip sirius-4.9.3-osx64-headless.zip
Note: Qemistree was initially developed under Sirius 4.0.1 version. Since Sirius 4.0.1 got to its end of life, Qemistree was recently adapted to work with the new Sirius versions (>4.4.29).
q2-qemistree
ships with the following methods:
qiime qemistree compute-fragmentation-trees
qiime qemistree rerank-molecular-formulas
qiime qemistree predict-fingerprints
qiime qemistree make-hierarchy
qiime qemistree get-classyfire-taxonomy
qiime qemistree prune-hierarchy
To generate a tree that relates the MS1 features in your experiment, we need to pre-process mass-spectrometry data (.mzXML, .mzML or .mzDATA files) using MZmine2 and produce the following inputs:
MassSpectrometryFeatures
artifact.FeatureTable[Frequency]
artifact.These input files can be obtained following peak detection in MZmine2. Here is an example MZmine2 batch file used to generate these.
To begin this demonstration, create a separate folder to store all the inputs and outputs:
mkdir demo-qemistree
cd demo-qemistree
Download a small feature table and MGF file using:
wget https://raw.githubusercontent.com/biocore/q2-qemistree/master/q2_qemistree/demo/feature-table.biom
wget https://raw.githubusercontent.com/biocore/q2-qemistree/master/q2_qemistree/demo/sirius.mgf
We import these files into the appropriate QIIME 2 artifact formats as follows:
qiime tools import --input-path feature-table.biom --output-path feature-table.qza --type FeatureTable[Frequency]
qiime tools import --input-path sirius.mgf --output-path sirius.mgf.qza --type MassSpectrometryFeatures
Note: If the MGF file has formatting errors (eg. no MS1 are included in the MGF, or if an MS1 entry does not have a corresponding MS2 entry), then an appropriate error message will help users troubleshoot this step before proceeding forward. First, we generate fragmentation trees for molecular peaks detected using MZmine2:
qiime qemistree compute-fragmentation-trees --p-sirius-path 'sirius.app/Contents/MacOS' \
--i-features sirius.mgf.qza \
--p-ppm-max 15 \
--p-profile orbitrap \
--p-ions-considered '[M+H]+' \
--p-java-flags "-Djava.io.tmpdir=/path-to-some-dir/ -Xms16G -Xmx64G" \
--o-fragmentation-trees fragmentation_trees.qza
Note: /path-to-some-dir/
should be a directory where you have write permissions and sufficient storage space. We use -Xms16G and -Xmx64G as the minimum and maximum heap size for Java virtual machine (JVM). If left blank, q2-qemistree will use default JVM flags.
This generates a QIIME 2 artifact of type SiriusFolder
. This contains fragmentation trees with candidate molecular formulas for each MS1 feature detected in your experiment.
Note 2: The new Sirius versions have the parameter --p-ions-considered
, which refers to the adduct of the MS/MS data to considered. Here are some examples: [M+H]+, [M+K]+, [M+Na]+, [M+H-H2O]+, [M+H-H4O2]+, [M+NH4]+, [M-H]-, [M+Cl]-, [M-H2O-H]-, [M+Br]-.
You can also provide a comma-separated list. Example: '[M+H]+, [M+Na]+'.
Next, we select top scoring molecular formula as follows:
qiime qemistree rerank-molecular-formulas --p-sirius-path 'sirius.app/Contents/MacOS' \
--i-features sirius.mgf.qza \
--i-fragmentation-trees fragmentation_trees.qza \
--p-zodiac-threshold 0.95 \
--p-java-flags "-Djava.io.tmpdir=/path-to-some-dir/ -Xms16G -Xmx64G" \
--o-molecular-formulas molecular_formulas.qza
This produces a QIIME 2 artifact of type ZodiacFolder
with top-ranked molecular formula for MS1 features. Now, we predict molecular substructures in each feature based on the molecular formulas. We use CSI:FingerID for this purpose as follows:
qiime qemistree predict-fingerprints --p-sirius-path 'sirius.app/Contents/MacOS' \
--i-molecular-formulas molecular_formulas.qza \
--p-ppm-max 20 \
--p-java-flags "-Djava.io.tmpdir=/path-to-some-dir/ -Xms16G -Xmx64G" \
--o-predicted-fingerprints fingerprints.qza
This gives us a QIIME 2 artifact of type CSIFolder
that contains probabilities of molecular substructures (total 2936 molecular properties) within in each feature.
We use these predicted molecular substructures to generate a hierarchy of molecules as follows:
qiime qemistree make-hierarchy \
--i-csi-results fingerprints.qza \
--i-feature-tables feature-table.qza \
--o-tree qemistree.qza \
--o-feature-table feature-table-hashed.qza \
--o-feature-data feature-data.qza
To support meta-analyses, this method is capable of handling one or more datasets i.e pairs of CSI results and feature tables. You will need to download a new feature table and csi fingerprint result from another experiment to test this functionality as follows:
wget https://raw.githubusercontent.com/biocore/q2-qemistree/master/q2_qemistree/demo/feature-table2.biom.qza
wget https://raw.githubusercontent.com/biocore/q2-qemistree/master/q2_qemistree/demo/fingerprints2.qza
Below is the q2_qemistree command to co-analyze the datasets together:
qiime qemistree make-hierarchy \
--i-csi-results fingerprints.qza \
--i-csi-results fingerprints2.qza \
--i-feature-tables feature-table.qza \
--i-feature-tables feature-table2.biom.qza \
--o-tree merged-qemistree.qza \
--o-feature-table merged-feature-table-hashed.qza \
--o-feature-data merged-feature-data.qza
Additionally, Qemistree also supports the inclusion of structural annotations made using MS/MS spectral library matches for downstream analysis using the optional input --i-ms2-matches
as follows:
qiime qemistree make-hierarchy \
--i-csi-results fingerprints.qza \
--i-feature-tables feature-table.qza \
--i-ms2-matches /path-to-MS2-spectral-matches.qza/ \
--o-tree qemistree.qza \
--o-feature-table feature-table-hashed.qza \
--o-feature-data feature-data.qza
Note:
--i-ms2-matches
can be obtained using Feature-based molecular networking or FBMN workflow supported in the web-based mass-spectrometry data analysis platform, GNPS. To use MS2 matches in Qemistree, please download the results of FBMN workflow and import the tsv file in the folder clusterinfo_summary
as a QIIME2 artifact of type FeatureData[Molecules]
as follows:qiime tools import \
--input-path path-to-MS2-spectral-matches.tsv \
--output-path path-to-MS2-spectral-matches.qza \
--type FeatureData[Molecules]
This method generates the following:
--p-ppm-max
, --p-zodiac-threshold
). This output is of type FeatureTable[Frequency]
.Phylogeny[Rooted]
. By default, we retain all fingerprint positions i.e. 2936 molecular properties). Adding --p-qc-properties
filters these properties to keep only PubChem fingerprint positions (489 molecular properties) in the contingency table.
Note: The latest release of SIRIUS uses PubChem version downloaded on 13 August 2017.parent_mass
), retention time (retention_time
), CSI:FingerID structure predictions (csi_smiles
), MS2 match structure predictions (ms2_smiles
), and the table(s) (table_number
) that each feature was detected in. This is of type FeatureData[Molecules]
. (The renaming of features helps prevent overlap between non-unique feature identifiers in the original feature tables in case of meta-analyses)These can be used as inputs to perform chemical phylogeny-based alpha-diversity and beta-diversity analyses.
Furthermore, Qemistree supports the classification of molecules into Classyfire chemical taxonomy. We generate a feature data table (also of the type FeatureData[Molecules]
) which includes classification of molecules into chemical 'kingdom', 'superclass', 'class', 'subclass', and 'direct_parent'. We can run Classyfire using Qemistree as follows:
qiime qemistree get-classyfire-taxonomy \
--i-feature-data merged-feature-data.qza \
--o-classified-feature-data classified-merged-feature-data.qza
Qemistree will use ms2_smiles
to make chemical taxonomy assignments, when MS2 matches are available for a feature. Otherwise, csi_smiles
will be used. The column structure_source
in classified-merged-feature-data.qza
records whether taxonomic assignment was done using CSI:FingerID predictions or MS/MS library matches.
Lastly, Qemistree includes some utility functions that are useful to visualize and explore the molecular hierarchy generated above. Qemistree trees can be visualized using q2-empress [preprint]. Below are the installation instructions that can be run within your qiime2 environment:
pip uninstall --yes emperor
pip install git+https://github.com/biocore/empress.git
qiime dev refresh-cache
qiime qemistree prune-hierarchy \
--i-feature-data classified-merged-feature-data.qza \
--p-column class \
--i-tree merged-qemistree.qza \
--o-pruned-tree merged-qemistree-class.qza
Users can choose any of the data columns (--p-column
) that are in the classified-merged-feature-data.qza
file to prune the hierarchy. For e.g. '#featureID','kingdom', 'superclass', 'class', 'subclass', 'direct_parent', and 'smiles'. All features with no data in this column will be removed from the phylogeny.
qiime empress community-plot \
--i-tree merged-qemistree-class.qza \
--i-feature-table feature-table-hashed.qza \
--m-sample-metadata-file path-to-sample-metadata.tsv \
--m-feature-metadata-file classified-merged-feature-data.qza \
--o-visualization empress-tree.qzv
The output empress QZV can be visualized using Qiime2 Viewer; EMPress can be used to interactively modify the tree visualization. Below is an example visualization from Empress' preprint. Here, the user has sample metadata columns (food sources) to compare groups of food samples; Empress enables them to visualize metabolite relative prevalence as barcharts at the tips of the tree.
Please visit the Empress tutorial for all the currently supported tree visualization features that can be leveraged to explore the chemical diversity of your metabolomics dataset.