VIB-PSB / MINI-AC

Motif-Informed Network Inference based on Accessible Chromatin (MINI-AC) is a method that combines accessible chromatin data from bulk or single-cell experiments with transcription factor binding site enrichment to learn gene regulatory networks in plants
Other
4 stars 0 forks source link

MINI-AC

MINI-AC stands for Motif-Informed Network Inference based on Accessible Chromatin, a method that combines accessible chromatin data from bulk or single-cell experiments with transcription factor binding site enrichment to learn gene regulatory networks (GRNs) in plants. The present README contains a tutorial of how to use MINI-AC, and how to modify its parameters to adapt it to user-desired settings.

MINI-AC uses a dual license to offer the distribution of the software under a proprietary model as well as an open source model.

Pipeline summary

  1. Generation of background model for input accessible chromatin regions (ACRs).
  2. Use of background ACRs to compare with real ACRs and compute motif enrichment statistics.
  3. Inference of a GRN based on motif enrichment results.
  4. Generation of a functional GRN by gene ontology (GO) enrichment of the regulons.
  5. Integration of data to generate informative, user-friendly output files.

Currently, two species are supported by MINI-AC: Arabidopsis thaliana and two maize genome versions (B73 RefGen_v4 and B73 RefGen_v5). Additionally, it can be run on two different modes depending on the non-coding genomic space considered for motif mapping:

A detailed overview of the necessary input files and expected output files can be found in this example, done on maize V4 with the genome-wide mode, and using as input a single-cell-derived ACR dataset of mesophyll and bundle sheath.

Inputs

The pipeline will run in parallel for multiple ACR BED input files. The two optional input files can be provided individually for all the input ACR datasets, meaning that one single "DEG file" and one "Expressed genes file" can be provided for all the ACRs, or there can be multiple files which need to be paired to each ACR file. This is done through the naming of the files. For further details consult here.

Outputs

Requirements

NOTE: MINI-AC was developed using the following versions: Nextflow version 21.10.6, Singularity version 3.8.7-1.el7 and in a Sun Grid Engine (SGE) computer cluster.

Usage

Define the paths with the input files and the desired parameters setting in the configuration file, and run it executing the following Nextflow command:

nextflow -C mini_ac.config run mini_ac.nf --mode <genome_wide|locus_based> --species <arabidopsis|maize_v4|maize_v5>

Having problems running MINI-AC? Check the FAQ.

iCREs-based MINI-AC

Given the amount of resources available to profile regulatory DNA in maize, we curated a collection of integrated cis-regulatory elements (iCREs) by combining and comparing different CRE-profiling methods (details to be published).

We implemented a new framework in which it is possible to run MINI-AC given a list of maize genes. It works by retrieving the genomic coordinates of the iCREs associated with genes of interest, and submitting them to motif enrichment and GRN inference using the genome-wide mode of MINI-AC. iCREs-based MINI-AC can only be run for maize, and not for Arabidopsis. In addition, we offer different sets of iCREs that are used in the run: the "maxF1" (maxf1) set or the "all" (all) set. The first uses a set of putative CREs that is smaller but more precise (less false positives), while the second uses a more comprehensive and complete collection of maize putative CREs.

To download files with the genomic coordinates of the iCREs, the following commands should be executed on the top-level directory of the repository:

For maize RefGen_v4

  wget https://zenodo.org/records/13143829/files/maxf1_icres_zma_v4.bed?download=1 -O data/icres/maxf1_icres_zma_v4.bed
  wget https://zenodo.org/records/13143829/files/all_icres_zma_v4.bed?download=1 -O data/icres/all_icres_zma_v4.bed

For maize RefGen_v5

  wget https://zenodo.org/records/11192739/files/maxf1_icres_zma_v5.bed?download=1 -O data/icres/maxf1_icres_zma_v5.bed
  wget https://zenodo.org/records/11192739/files/all_icres_zma_v5.bed?download=1 -O data/icres/all_icres_zma_v5.bed

To run iCREs-based MINI-AC, the configuration file should be prepared as explained here. Only two parameters change in comparison to the regular MINI-AC runs. Instead of providing a BED file with ACR genomic coordinates, a list of gene IDs from the maize genome version V4 or V5 should be provided, as exemplified here. In addition, an iCREs set should be specified (maxf1 or all). Next, the following Nextflow command should be executed:

nextflow -C mini_ac_icres.config run mini_ac_icres.nf --icres_set <all|maxf1> --species <maize_v4|maize_v5>

Support

Should you encounter a bug or have any questions or suggestions, please open an issue.

Citation

When publishing results generated using MINI-AC, please cite:

Nicolás Manosalva Pérez, Camilla Ferrari, Julia Engelhorn, Thomas Depuydt, Hilde Nelissen, Thomas Hartwig, and Klaas Vandepoele. “MINI-AC: Inference of Plant Gene Regulatory Networks Using Bulk or Single-Cell Accessible Chromatin Profiles.” The Plant Journal 117, no. 1 (2024): 280–301. https://doi.org/10.1111/tpj.16483.

Contact

If you have any questions, encounter issues, or want to contribute to this project, please feel free to reach out to Klaas Vandepoele (klaas.vandepoele@psb.vib-ugent.be).