CostaLab / reg-gen

Regulatory Genomics Toolbox: Python library and set of tools for the integrative analysis of high throughput regulatory genomics data.
https://reg-gen.readthedocs.io/
Other
103 stars 30 forks source link

RGT-Hint footprinting and differential analysis output #64

Open mdurante1 opened 6 years ago

mdurante1 commented 6 years ago

Hi,

I have been able to succesfully run the RGT-Hint pipeline according to the tutorial on the website (https://www.regulatory-genomics.org/hint/tutorial/) to conduct footprinting and differential analysis on ATAC-Seq data. I have obtained the plots that are generated from running the "rgt-hint differential" command. Can you please explain what the value that is plotted on the y-axis is and how it is obtained? Is there a way to obtain the raw data that was used to generate these plots so that different plots can be generated? I would like to see which specific genomic regions are contributing most to the observed differences in the plot. When I observe certain regions, using .bw files in igv, that I see in the _mbps.bed file for a given TF I don't see many differences so I would like to be able to quantitate which regions have large differences in TF binding probability.

Also is there a way to generate statistical significance to see if the plots are depicting a significant change. Some of the plots will show differences in some areas and not others and it is difficult to interpret whether that TF has a significant difference in binding probability.

Thanks for all of your help, it is greatly appreciated.

Best, Michael

lzj1769 commented 6 years ago

Hi @mdurante1 ,

The y-axis is the average ATAC-seq signal around the predicted transcription factor binding sites.

In our last release, we provided the option --output-profiles to write the footprint profiles into a text file, in which each row represents a specific instance of the given motif. In addition, we included a statistical test in the scatter plot to highlight the significant factors.

Best, Li

YingziZhang-github commented 1 year ago

Hi @lzj1769 ,

I used rgt-hint to do footprinting analysis.

The command I used was rgt-hint differential --organism hg38 --bc --nc 20 --standardize --mpbs-files condition1.bed,condition2.bed --reads-files condition1.bam,condition2.bam --conditions condition1,condition2 --output-prefix footprinting_differential --output-location=footprinting_standardize. When I used defaulted lfc value, in the output log2foldchange plot, there are many TFs that locate really far away from the plot. When I adjusted lfc values with 2 and 20, the far away TFs were just disappeared. Do you have suggestions for me to solve this issue? The TFs far from the plot should be the ones with high significance, so that they locate so different. BTW, the dot colors in the plot are different as they are in the legend.

Thank you very much. Looking forward to your reply.

using -lfc 0.1 Many TFs are really out of space. 17331674046958_ pic

using -lfc 2 17281674046599_ pic

using -lfc 20 17301674046629_ pic

Yingzi

lzj1769 commented 1 year ago

@minashaigan

Any ideas about this issue?

YingziZhang-github commented 1 year ago

@minashaigan

Any ideas about this issue?

Dear Zhijian,

Thank you very much for the reply. I am looking forward to your feedback! If it is more suggested by you, I can customize the plot by drawing using thergtoutput data as well. The output files I can see are (named by default) the _differentialfactor.txt and _differentialstatistics.txt. Would you please suggest if and how can I utilize the inside values to re-draw log2foldChange plot and the activity statistics plot? Are the log2(Fold Change) in the above discussion equal to or are they the log2value of "TF_Activity" in _differentialstatistics.txt?

rgt till now has given me many exciting results. It would really be nice if I can customize and polish the rgt output figures.

Thank you very much.

Yingzi

minashaigan commented 1 year ago

Hello Yingzi,

To have a symmetric figure, I define x limits of the plot based on the round of max of abs log2(FoldChange). All your fold changes are more minor than 0.5, which will be rounded to 0. So I will replace round with ceil.

Yes, the log2(Fold Change) in equal to the substraction of the log2value of "TF_Activity" in differential_statistics.txt

Thanks for the feedback, Mina

lzj1769 commented 1 year ago

Hi @YingziZhang-github

The file _differentialfactor.txt contains normalization factors that rgt-hint used to normalize the ATAC-seq between conditions to account for different sequencing depths.

As @minashaigan pointed out, you can find the raw outputs in _differentialstatistics.txt, and use it for customizing plot.

Best, Zhijian

YingziZhang-github commented 1 year ago

Hi @minashaigan and @lzj1769 ,

Thank you for the answering. DifferentialAnalysis.py greatly helps also. My customizing plot works very well.

Many thanks, Yingzi