deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
227 stars 70 forks source link

resolution hicPlotTADs #815

Closed MarineBergot closed 1 year ago

MarineBergot commented 2 years ago

Hi!

I'm still working with hicexplorer and i have questions about plot. I can't succeed to plot nice and clean plot. I don't know if it's because i didn't produce nice matrix or if i'm not using proper parameters for plot. I first aligned my reads with bwa on hg38 and then created matrix for differents bins ("10000" "50000" "100000" "500000") and --inputBufferSize 400000

after i corrected and normalized matrix (with "smallest"). I then search TAD on raw matrix (maybe it's not the best? i think i read on your doc this is the best way)

to finish i used mergeDomains

i tried to plot TAD for each bins but it's really not so good.my best attempt was with 50000. i used : [x-axis] where = top

[hic matrix] file = C1829A_LR51_50000_norm_correct.h5 title = 50kb

depth is the maximum distance plotted in bp. In Hi-C tracks

the height of the track is calculated based on the depth such

that the matrix does not look deformed

depth = 1000000 transform = log1p file_type = hic_matrix

[tads] file = C1829A_LR51_TADs_50000_domains.bed file_type = domains border_color = black color = none

the tads are overlay over the hic-matrix

the share-y options sets the y-axis to be shared

between the Hi-C matrix and the TADs.

overlay_previous = share-y

and used this command : hicPlotTADs --tracks hic_track3.ini -o test3.png --region chr7:156500000-160000000

i'm using last version of HiCExplorer (3.7) , with python 3.9 on server and i installed it with conda.

thanks a lot for your help ! 50kb

graph_5kb

100kb test

10kb test3

lldelisle commented 2 years ago

Hi, There might be different issues. First, depends what you mean by nice... I don't know how many valid pairs you have but the resolution you can use indeed depends on your data. The more valid pairs within the 10kb-1Mb range distance you have the more you can decrease your bin size. If you want to see the TADs, as you are on mammals, the TADs are generally about 1Mb so you should put in 'depth' always more than 1Mb, let say, put 2Mb (2000000). Also, I don't know if it is on purpose but you are looking at the end of the chromosome. This region may not be the best. Personally I am working on a lab which studies Hox genes and the HoxA cluster and HoxD cluster give very nice textbook TAD pictures, so if you want to check the quality of your Hi-C you can plot the HoxA or HoxD region at all the resolutions you have (you can plot all of them on the same). If you are working hg38, I would say, let's try: chr2:174,813,091-177,342,770, you should see a beautiful boundary between 2 TADs.

I personally call TADs on corrected matrices. I think this is what is recommanded here: https://hicexplorer.readthedocs.io/en/latest/content/example_usage.html#tad-calling

MarineBergot commented 2 years ago

Hi,

Thanks a lot for your answer! Unfortunatly yes i need to work on this area because right here i have a genomic recombination for a foetus, then i tried to follow your recommandation. Here's my "best" try, the first one is my control and the second is the foetus with genomic recombination. I didn't have same quality for both but i hope it's still good? i used depth = 1000000 for both. I tried other for the first but it's not really better. I'm stil not expert with HiC data, can we say that, for the 1st one we can see clearly two TAD? one big and one smaller on the right? and for the second one, the big one was splited into and neo TAD was created? do you think we could imprive this file? thanks a lot for all your advices, Marine TAD_CTRL_agenesis

lldelisle commented 2 years ago

Hi, I think it is better to compare both at the same resolution. Your second is indeed less good than the first one but I think it is sufficient to see what you want. I would compute the same resolution for the first one so you can better compare. It looks like you have a deletion and that the contacts between the 2 part of the initial TAD disappear. I guess it is because you have a rearrangement and both parts are stuck to another part of the genome. Yes the big TAD is split into 2 creating 2 'neo-TADs' but you cannot say the limit of the neo-TADs as they can be in other parts of the genome.

MarineBergot commented 2 years ago

Hi, thanks again for your answer ! ok i will change resolution :) last question, if i have 4 replicate for one sample, what is the best way to agregate them ? i'm not sure hicMergeMatrixBinsis made for that ? or maybe it's better during mapping? thanks again !

lldelisle commented 2 years ago

If you already computed the matrices (before correction), use hicSumMatrices does it. Else, you can merge all read1 bam files and all read2 bam files with samtools merge And then compute matrices on the 2 merged bam.

lldelisle commented 2 years ago

Also what you can do to highlight the difference between your 2 matrices (control and rearranged) is to do a hicCompareMatrices, I personally prefer the difference over the log2ratio. This should highlight that you loose contacts in the diamond above the deleted part where you have the rearrangment.

MarineBergot commented 2 years ago

thanks a lot for all your answers and advices it really helps me ! i will re run everything and try again with hicCompareMatrices by any chance, do you know if it's possible to change the color of vlines? i saw it's possible for hlines, i tried to just set color = red in the paragraph but nothing changed, maybe it's not possible? thanks!

lldelisle commented 2 years ago

For the moment it is not possible, I can write a PR for that but this will wait for the next release.

MarineBergot commented 2 years ago

awsome thanks a lot! last question of a day, is it possible to draw the full chromosome? i tried --region chr7:0-158000000 on an svg but it's a complete fail ^^'

lldelisle commented 2 years ago

I need more details ? Error or you can't open it?

MarineBergot commented 2 years ago

no more like it looks like nothing ^^' i tried with : hicPlotTADs --tracks hic_track_all.ini -o TAD_CTRL_agenesis_fullchr7.svg --region chr7:0-158000000 the file .ini is the same TAD_CTRL_agenesis_fullchr7

lldelisle commented 2 years ago

I am not sure adding the vlines for each gene when you plot a whole chromosome is a good idea. You may also be interested in removing the labels for the genes at this scale.

lldelisle commented 2 years ago

Also you may be interested in increasing the depth of Hi-C. They look pretty small.

lldelisle commented 1 year ago

More parameters for vlines has been implemented in the last release of pyGenomeTracks which will be available on conda later today.