Are you interested in contextualising brain maps, maybe derived from case-control-comparisons, fMRI tasks, or spatial meta-analysis, across biological systems ranging from molecular and cell levels to disease-associations? ABAnnotate uses spatial gene expression patterns to derive neuroimaging phenotype-gene associations and asses the overrepresentation of associated genes in several multimodal gene-category datasets.
(Note: ABAnnotate inherited its license from its source toolbox. Integrated datasets, especially data from the Allen Institute for Brain Science, are licensed under non-commercial licenses which is to be considered when using ABAnnotate.)
ABAnnotate is a Matlab-based toolbox to perform ensemble-based gene-category enrichment analysis (GCEA) on volumetric human neuroimaging data via brain-wide gene expression patterns derived from the Allen Human Brain Atlas (ABA). It applies a nonparametric method developed by Fulcher et al. (2021) using spatial autocorrelation-corrected phenotype null maps for the estimation of gene-category null ensembles. ABAnnotate was adopted from Fulcher et al.`s toolbox which was originally designed for annotation of imaging data to GeneOntology categories. The function to generate null models, along with some utility functions, were taken from the JuSpace toolbox by Dukart et al. (2021).
ABAnnotate is under development. It works of the box but you may well encounter bugs when using it. Please feel free to report these by opening an issue or contacting me.
The method basically consists of the following steps:
ABAnnotate extends Fulcher et al.'s toolbox by:
All datasets (atlases, ABA data, GCEA datasets) are stored on an OSF server. Source information is provided in dataset_sources.csv
which can be loaded and updated from OSF via:
sources_table = abannotate_get_sources;
ABAnnotate automatically downloads selected datasets to the two folders \atlas
(parcellation volumes and parcel-wise ABA data) and \datasets
(GCEA datasets). You can also download the data manually from OSF and save it in the respective folders.
The toolbox relies heavily on ABA data which was imported through the abagen toolbox using the default settings. For each parcellation, there is an associated {atlas_name}_report.md
file with information on the processing done by abagen.
Currently, three parcellations are implemented: A functionally defined parcellation combined from 100 cortical (Schaefer et al., 2018) and 16 subcortical parcels (Tian et al., 2020), a second version of this parcellation with only the 100 cortical parcels, and the anatomically defined whole-brain Neuromorphometrics atlas (8 regions without ABA data (31, 72, 118, 121, 148, 149, 156, 174) were removed: 111 parcels).
See example/customization.md
for information on how to import your own ABA data (e.g., if you want to use a custom parcellation or alter ABA mRNA expression data processing).
Current GCEA datasets include:
To get a list of all available GCEA datasets run:
abannotate_get_datasets;
Output:
Available GCEA datasets:
- ABA-brainSpan-weights
- DAVID-chromosome-discrete
- DAVID-cytogenicLocation-discrete
- DisGeNET-diseaseCuratedAll-discrete
- DisGeNET-diseaseCuratedMental-discrete
- DisGeNET-diseaseAllAll-discrete
- DisGeNET-diseaseAllMentalBehav-discrete
- GO-biologicalProcessDirect-discrete
- GO-biologicalProcessProp-discrete
- GO-molecularFunctionDirect-discrete
- GO-molecularFunctionProp-discrete
- GO-cellularComponentDirect-discrete
- GO-cellularComponentProp-discrete
- PsychEncode-cellTypesTPM-discrete
- PsychEncode-cellTypesUMI-discrete
Please note that, while ABAnnotate is published under a GPL-3.0 license which allows for commercial use, associated datasets are protected by other licences (e.g., ABA data may not be used commercially, DisGeNET data are protected under a CC BY-NC-SA 4.0 license). If available, these licensed are listed in dataset_sources.csv
. This effectively renders ABAnnotate, if used as is, unsuitable for commercial use!
ABAnnotate was coded in Matlab R2021a. For generation of phenotype null maps, it depends on the SPM12 image calculator. It uses the Parallel Processing toolbox for generation of null phenotypes and correlation calculation. It requires an internet connection to download parcellations, ABA data and GCEA datasets from OSF.
The simplest use case requires only a NIfTI volume in MNI space and the selection of one of the GCEA datasets provided with ABAnnotate. The below code will perform a GCEA on an input volume with 1000 null maps corrected for spatial autocorrelation using GeneOntology "Biological Process" categories with annotated genes propagated upwards through the GeneOntology hierarchy (as opposed to only using direct annotations between categories and genes); phenotype-gene associations will be computed using Spearman correlations and category scores will estimated as average r-to-Z-transformed correlation coefficients.
Download the toolbox and add it to the matlab path:
startup;
All Options are defined in a struct
array:
opt.analysis_name = 'GCEA_GeneOntology'; % name for analysis
opt.phenotype = '/path/to/input/volume.nii'; % input "phenotype" volume
opt.dir_result = '/path/to/save/output'; % output directory
opt.GCEA.dataset = 'GO-biologicalProcessProp-discrete'; % selected GCEA dataset
Run:
results_table = ABAnnotate(opt);
You can define various options and provide precomputed data (see below). You can also use your own parcellation, but will then have to generate a custom ABA gene expression dataset. All options are shown in example/customization.md
.
opt.analysis_name = 'GCEA_GeneOntology'; % name for analysis
opt.phenotype = '/path/to/input/volume.nii'; % input "phenotype" volume
opt.phenotype_nulls = '/path/to/precomputed/phenotype_nulls.mat'; % use already computed phenotype nulls
opt.n_nulls = 1000; % number of null phenotypes/categories, will be overwritten with n nulls from .phenotype_nulls
opt.atlas = 'SchaeferTian'; % one of {'SchaeferTian', 'Neuromorphometrics', 'Schaefer'}
opt.dir_result = '/path/to/save/output'; % output directory
opt.GCEA.dataset = 'GO-biologicalProcessProp-discrete'; % selected GCEA dataset
opt.GCEA.size_filter = [5, 200]; % select categories with between 5 and 200 annotated genes
opt.GCEA.correlation_method = 'Spearman'; % one of {'Spearman', 'Pearson'}
opt.GCEA.aggregation_method = 'mean'; % one of {'mean', 'absmean', 'median', 'absmedian', 'weightedmean', 'absweightedmean'}
opt.GCEA.p_tail = 'right'; % one of {'right', 'left'}
ABAnnotate can incorporate "continuous" GCEA datasets with gene expression values across the whole genome for each category. This currently applies only to the BrainSpan dataset. You can choose your own thresholding settings to define marker genes and weight each gene-phenotype correlation by the gene's expression value when calculating category scores:
opt.analysis_name = 'GCEA_BrainSpan'; % name for analysis
opt.phenotype = '/path/to/input/volume.nii'; % input "phenotype" volume
opt.dir_result = '/path/to/save/output'; % output directory
opt.GCEA.dataset = 'ABA-brainSpan-weights'; % selected GCEA dataset
opt.GCEA.aggregation_method = 'weightedmean'; % one of {'mean', 'absmean', 'median', 'absmedian', 'weightedmean', 'absweightedmean'}
opt.GCEA.weights_quant = 0.90; % retain only genes with expression values > 0.9th quantile of the whole dataset
opt.GCEA.weights_cutoff = false; % if true, binarize expression values -> standard mean will be calculated. If false, use weighted mean
opt.GCEA.gene_coocc_thresh = 0.2; % retain only genes annotated to 20% or less of categories after weight thresholding
Default GCEA options are imported from gcea_default_settings.m.
ABAnnotate's main output consists of a table with as many rows as there are categories in the current dataset.
Below you see an example output from the neuronal cell type dataset (transcripts per kilobase million; TPM). Here, we have marker sets for 24 cell types (Ex/In = excitatory/inhibitory neuron subclasses; see Lake et al. (2016) for detailed information). The three top categories are significant at FDR-corrected p < .05 using the nonparametric procedure.
cLabel
= category name; cDesc
= category descriptions; cSize
= number of genes annotated to category; cGenes
= official gene symbols; cWeights
= expression values for each gene, will be vector of ones if discrete dataset (most cases); cScoresNull
= null category scores (here, 5000 null samples); cScorePheno
= e.g., mean of r-to-z-transformed phenotype-gene Spearman correlation coefficients for all genes in category; pValPerm
= exact p-value derived from the null distribution of category scores; pValPermCorr
= FDR-corrected "q"-value; pValZ(Corr)
= p-value derived from Z-distribution fitted to the null data to approximate very small p-values.
Output files: A .mat-file with the table (see above) and the input options struct, a .csv-file with a reduced version of the table, a .xml file generated from the options struct, and a log file with the matlab terminal output.
A/B: neuroimaging phenotype — neuronal cell type associations (see table above); A: bars representing category scores, color showing the negative base 10 logarithm of the uncorrected p-values derived from the z-distribution, *FDR-significant (nonparametric); B: gene-wise spatial correlation patterns for each gene annotated to one of the three significantly associated cell types.
C: neuroimaging phenotype — developmental brain-regional gene expression (BrainSpan); point size representing category scores, color showing the negative base 10 logarithm of the uncorrected p-values derived from the z-distribution, squares mark FDR-significant (nonparametric) categories.
In example/example_pain.md
, I provide exemplary analyses using ABAnnotate to relate a meta-analytic brain map of pain processing to the integrated neuronal cell type markers, BrainSpan, and GeneOntology "biological process" datasets. In example/customization.md
, I line out several implemented customization options.
If you use ABAnnotate in publications, please cite the following sources:
Do you have questions, comments or suggestions, would like to contribute to the toolbox, or would like to see a certain gene-category dataset added to the toolbox? Open an issue or contact me!