quantification of footprint

qingnanl commented 2 years ago

Dear authors,

Thanks for building these tools. I would like to ask about the footprint analysis: is there a quantification metric for the footprint inferred by hint-atac? With the default, the output is a bed file with the location of the footprint. I understand that the 'column 5' somehow indicates the size of the peak (the larger, the higher reliability of the data). Just want to know if there are other metrics (even if a p-value) indicating how likely the footprint is real. I did see some hidden options in the hint-atac python code, but I do not fully understand the meaning of the options. Thanks.

qingnanl

lzj1769 commented 2 years ago

Hi @qingnanl

The 5th column represents the number of reads around the predicted footprint. Based on our evaluation, this value is a good metric for selecting good footprints.

So, you can use a threshold to pick up a number of footprints.

best, Zhijian

qingnanl commented 2 years ago

Hi @qingnanl

The 5th column represents the number of reads around the predicted footprint. Based on our evaluation, this value is a good metric for selecting good footprints.

So, you can use a threshold to pick up a number of footprints.

best, Zhijian

Thank you. Just curious, how does this number correlate with 'protection score'?

qingnanl

lzj1769 commented 2 years ago

Hi，

The protection score is calculated for each TF by combining the signal from all the predicted binding sites for this TF after footprinting.

This score can be used to quantify the binding activity for the TF in a sample. And we also found this score is highly correlated with the binding time of a TF. In other words, if a TF is binding to the DNA for a relatively long time, then it will leave a clear footprint, so the protection score is expected to be high.

see Fig.5 here for an example: https://www.nature.com/articles/nmeth.3772/figures/5

Reading the number in the 5th column, this value is specific to a single footprint and has no relation with TF, unless the motif matching is performed to associate footprint with TFs.

One can of course estimate a protection score for each binding site, but the signal around a single binding site might be too sparse to get a meaningful score.

Again, if we look at Fig5 b-c, the line plots represent the open chromatin profiles from all binding sites (n = XXX in the left-upper corner).

Best, Zhijian

qingnanl commented 2 years ago

Thanks, Zhijian. That is very helpful. Let me close this. Best,

qingnanl

CostaLab / reg-gen

quantification of footprint #207