TomMaullin / BLMM

This repository contains all code for the BLMM toolbox.
19 stars 5 forks source link

Pyblmm

This repository contains the code for Big Linear Mixed Models for Neuroimaging cluster and local usage.

Requirements

To use the pyblmm code, please pip install like so:

pip install pyblmm

Finally, you must set up your dask-jobqueue configuration file, which is likely located at ~/.config/dask/jobqueue.yaml. This will require you to provide some details about your HPC system. See here for further detail. For instance, if you are using rescomp your jobqueue.yaml file may look something like this:

jobqueue:
  slurm:
    name: dask-worker

    # Dask worker options
    cores: 1                 # Total number of cores per job
    memory: "100GB"                # Total amount of memory per job
    processes: 1                # Number of Python processes per job

    interface: ib0             # Network interface to use like eth0 or ib0
    death-timeout: 60           # Number of seconds to wait if a worker can not find a scheduler
    local-directory: "/path/of/your/choosing/"       # Location of fast local storage like /scratch or $TMPDIR
    log-directory: "/path/of/your/choosing/"
    silence_logs: True

    # SLURM resource manager options
    shebang: "#!/usr/bin/bash"
    queue: short
    project: null
    walltime: '01:00:00'
    job-cpu: null
    job-mem: null
    log-directory: null

    # Scheduler options
    scheduler-options: {'dashboard_address': ':46405'}

If running the BLMM tests on a cluster, fsl_sub must also be configured correctly.

Usage

To run BLMM-py first specify your design using blmm_config.yml and then run your analysis by following the below guidelines.

Specifying your model

The regression model for BLMM must be specified in blmm_config.yml. Below is a complete list of possible inputs to this file.

Mandatory fields

The following fields are mandatory:

Optional fields

The following fields are optional:

Examples

Below are some example blmm_config.yml files.

Example 1: A minimal configuration.

Y_files: /path/to/data/Y.txt
X: /path/to/data/X.csv
Z:
  - f1: 
      name: factorName
      factor: /path/to/data/Z1factorVector.csv
      design: /path/to/data/Z1DesignMatrix.csv
outdir: /path/to/output/directory/
contrasts:
  - c1:
      name: Tcontrast1
      vector: [1, 0, 1, 0, 1]

Example 2: A configuration with multiple optional fields.

MAXMEM: 2**32
Y_files: /path/to/data/Y.txt
data_mask_files: /path/to/data/M_.txt
data_mask_thresh: 0.1
X: /path/to/data/X.csv
Z:
  - f1: 
      name: factorName
      factor: /path/to/data/Z1factorVector.csv
      design: /path/to/data/Z1DesignMatrix.csv
  - f2: 
      name: factorName2
      factor: /path/to/data/Z2factorVector.csv
      design: /path/to/data/Z2DesignMatrix.csv
outdir: /path/to/output/directory/
contrasts:
  - c1:
      name: Tcontrast1
      vector: [1, 0, 0]
  - c2:
      name: Tcontrast2
      vector: [0, 1, 0]
  - c3:
      name: Tcontrast3
      vector: [0, 0, 1]
  - c4:
      name: Fcontrast1
      vector: [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
Missingness:
  MinPercent: 0.10
  MinN: 15
analysis_mask: /path/to/data/MNI152_T1_2mm_brain_mask.nii.gz

Running the Analysis

BLMM can be run from the terminal as follows:

blmm <name_of_your_yaml_file>.yml

You can watch your analysis progress either by using qstat or squeue (depending on your system), or by using the interactive dask console. To do so, in a seperate terminal, tunnel into your HPC as follows:

ssh -L <local port>:localhost:<remote port> username@hpc_address

where the local port is the port you want to view on your local machine and the remote port is the dask dashboard adress (for instance, if you are on rescomp and you used the above jobqueue.yaml, <remote port> is 46405). On your local machine, in a browser you can now go to http://localhost:<local port>/ to watch the analysis run.

Analysis Output

Below is a full list of image files output after a BLMM analysis.

Filename Description
blmm_vox_mask This is the analysis mask.
blmm_vox_n This is a map of the number of input images which contributed to each voxel in the final analysis.
blmm_vox_edf This is the spatially varying niave degrees of freedom*.
blmm_vox_beta These are the beta (fixed effects parameter) estimates.
blmm_vox_sigma2 These are the sigma2 (fixed effects variance) estimates.
blmm_vox_D These are the D (random effects variance) estimates**.
blmm_vox_llh These are the log likelihood values.
blmm_vox_con These are the contrasts multiplied by the estimate of beta (this is the same as COPE in FSL).
blmm_vox_cov These are the between-beta covariance estimates.
blmm_vox_conSE These are the standard error of the contrasts multiplied by beta (only available for T contrasts).
blmm_vox_conR2 These are the partial R^2 maps for the contrasts (only available for F contrasts).
blmm_vox_resms This is the residual mean squares map for the analysis***.
blmm_vox_conT These are the T statistics for the contrasts (only available for T contrasts).
blmm_vox_conF These are the F statistics for the contrasts (only available for F contrasts).
blmm_vox_conTlp These are the maps of -log10 of the uncorrected P values for the contrasts (T contrast).
blmm_vox_conT_swedf These are the maps of Sattherthwaithe degrees of freedom estimates for the contrasts (T contrast).
blmm_vox_conFlp These are the maps of -log10 of the uncorrected P values for the contrasts (F contrast).
blmm_vox_conF_swedf These are the maps of Sattherthwaithe degrees of freedom estimates for the contrasts (F contrast).

The maps are given the same ordering as the inputs. For example, in blmm_vox_con, the 0th volume corresponds to the 1st contrast, the 1st volume corresponds to the 2nd contrast and so on. For covariances, the ordering is of a similar form with covariance between beta 1 and beta 1 (variance of beta 1) being the 0th volume, covariance between beta 1 and beta 2 being the 1st volume and so on. In addition, a copy of the design is saved in the output directory as inputs.yml. It is recommended that this be kept for data provenance purposes.

* These degrees of freedom are not used in inference and are only given as reference. The degrees of freedom used in inference are the Sattherthwaite approximations given in blmm_vox_conT_swedf and blmm_vox_conF_swedf . ** The D estimates are ordered as vech(D1),...,vech(Dr) where Dk is the Random effects covariance matrix for the kth random factor, r is the total number of random factors in the design and vech represents "half-vectorisation". *** This is optional and may differ from the estimate of sigma2, which accounts for the random effects variance.

Model Comparison

Note: The code described in this section is currently not supported. Please contact Tom Maullin for further information.

BLMM-py also offers model comparison for nested single-factor models via Likelihood Ratio Tests under a 50:50 chi^2 mixture distribtuion assumption (c.f. Linear Mixed Models for Longitudinal Data. 2000. Verbeke, G. & Molenberghs, G. Chapter 6 Section 3.). To compare the output of two single-factor models in BLMM-py (or the output of a single-factor model from BLMM-py with the output of a corresponding linear model run using BLM-py) run the following command:

bash ./blmm_compare.sh /path/to/the/results/of/the/smaller/model/ /path/to/the/results/of/the/larger/model/ /path/to/output/directory/

Below is a full list of image files output after a BLMM likelihood ratio comparison test.

Filename Description
blmm_vox_mask This is the analysis mask (this will be the intersection of the masks from each analysis).
blmm_vox_Chi2 This is the map of the Likelihood Ratio Statistic.
blmm_vox_Chi2lp This is the map of -log10 of the uncorrected P values for the likelihood ratio test.

Developer Notes

Testing

Currently, only unit tests are available for BLMM. These can be accessed by in the tests/Unit folder and must be run from the top of the directory.

Notation

Throughout the code, the following notation is universal.

The following subscripts are also common throughout the code:

When the user has specified 1 random factor and 1 random effect only, the matrices DinvIplusZtZD and ZtZ become diagonal. As a result of this, in this setting, instead of saving these variable as matrices of dimension (v,q,q) (one (q,q) matrix for every voxel), we only record the diagonal elements of these matrices. As a result, in this setting DinvIplusZtZD and ZtZ have dimension (v,q) throughout the code. This results in significant performance gains.

Structure of the repository

The repository contains 4 main folders, plus 3 files at the head of the repository. These are: