HYsxe / PRINT

32 stars 3 forks source link

multiScaleFootprinting

image

1. Overview

In this Github repo we present the multi-scale footprinting framework in our Hu et al. paper. In general, the algorithm takes ATAC-Seq (bulk or single cells) data as input, and try to detect DNA-protein interaction across spatial scales. More specifically, the model first internally corrects the sequence preference of Tn5, and then use a statistical model to calculate footprint score for each position within enhancers and promoters. The process is performed with a range of footprint kernel sizes, capturing DNA-binding proteins of different sizes and shapes. Conceptually, this procedure is similar to wavelet analysis where we decompose the input signal at each location across scales.

The multi-scale footprint pattern at any genomic location delineates local chromatin structure and can be used to infer TF and nucleosome binding. We have shown in our paper that multi-scale footprints can be used as input to neural network models to predict TF binding, even for TFs that do not leave visible footprints on their own.

Additionally, we have implemented the infrastructure for generating pseudo-bulks using single cell data, as well as running multi-scale footprinting using the pseudo-bulked data. This provides us with the unique opportunity to track chromatin structure dynamics across pseudo-time.


2. Key Components

3. Vignettes and Tutorials

Tutorials for running multi-scale footprinting on example data can be found here

Before running the tutorial, please download the pre-computed bias files from https://zenodo.org/record/7121027#.ZCbw4uzMI8N and put it in the data/shared/precomputedTn5Bias folder.

4. References

Hu et al., Multi-scale chromatin footprinting reveals wide-spread encoding of CRE substructures

5. Installation

Currently the framework can be installed by cloning the github repo.

6. Support

If you have any questions, please feel free to open an issue. You are also welcome to email me at yanhu@g.harvard.edu. We appreciate everyone's contribution!

7. Comming soon:

Currently the tool is implemented in R, which doesn't handle certein computation in the most efficient way. We are working on a ultra-fast python package which will be released soon. Stay tuned!