Closed bostanict closed 1 year ago
I'm not familiar with the HICUP output format, but if it is text, it should be possible to convert it to the sparse matrix format. If you provide small subsets of the data, I can try to write a script. You can also try Juicer format - we provide instructions how to get it in TADcompare. https://www.bioconductor.org/packages/release/bioc/vignettes/TADCompare/inst/doc/Input_Data.html#working-with-.hic-files.
As for visualizing TADcompare results and external TAD data, there are many tools. Have a look at https://github.com/mdozmorov/HiC_tools#visualization, it depends on which programming environment you are most familiar with.
Hi @mdozmorov for getting back to me so quick, here are the example outputs one from HOMER after HICUP
1 chrX 145207315 - chrUn_KI270742v1 13359 +
2 chrUn_KI270742v1 11867 - chrX 133811276 -
3 chrUn_KI270742v1 91413 - chrUn_KI270742v1 87814 +
4 chrUn_KI270742v1 15353 + chrX 26786267 +
5 chrUn_KI270742v1 15880 + chrX 145101514 +
6 chrUn_KI270742v1 139275 + chrUn_KI270742v1 134337 -
7 chrX 56351278 + chrUn_KI270742v1 14399 -
8 chrX 55213225 - chrUn_KI270742v1 37417 -
9 chrX 99319901 - chrUn_KI270742v1 161027 +
10 chrX 112058665 + chrUn_KI270742v1 176261 +
I also have the peak calls from HOMER which is in this format for TADs and LOOPs:
TADS:
chr6 43064999 43148998 chr6 43064999 43148998 255,255,0 2.928 2.928
chr20 60206999 60368998 chr20 60206999 60368998 255,255,0 1.911 1.911
chr11 128027999 128864998 chr11 128027999 128864998 255,255,0 2.372 2.372
chr13 97022999 97364998 chr13 97022999 97364998 255,255,0 2.879 2.879
chr3 20192999 22130998 chr3 20192999 22130998 255,255,0 1.943 1.943
chr2 172859999 172937998 chr2 172859999 172937998 255,255,0 2.323 2.323
chr20 20657999 20831998 chr20 20657999 20831998 255,255,0 2.883 2.883
chr6 33506999 33569998 chr6 33506999 33569998 255,255,0 1.634 1.634
chr9 90896999 91739998 chr9 90896999 91739998 255,255,0 3.010 3.010
chr1 60842999 61046998 chr1 60842999 61046998 255,255,0 2.175 2.175
Loops:
chr2 16770000 16773000 chr2 17667000 17670000 0,0,250 33.742222 2.244069
chr11 10626000 10629000 chr11 10734000 10737000 0,0,250 112.746944 1.957650
chr2 71301000 71304000 chr2 71511000 71514000 0,0,250 76.773333 2.257595
chr12 104208000 104211000 chr12 104355000 104358000 0,0,250 102.067778 2.029318
chr3 7410000 7413000 chr3 8109000 8112000 0,0,250 26.343333 1.601173
chr7 108972000 108975000 chr7 109824000 109827000 0,0,250 24.723333 1.768769
chr2 186996000 186999000 chr2 187449000 187452000 0,0,250 30.106667 1.589152
chr2 61992000 61995000 chr2 62775000 62778000 0,0,250 18.401917 1.665698
chr8 125118000 125121000 chr8 125697000 125700000 0,0,250 82.941667 2.588617
chr9 1461000 1464000 chr9 1608000 1611000 0,0,250 192.627778 2.516918
I am not sure if your TADCompare also works on Loops too but So looking forward to it,
thanks
Something is incomplete. The Homer after HICUP has paired genomic coordinates but not interaction frequencies. What's HICUP output?
The hiccup output is the read pairs in bam format.
On Fri, Apr 30, 2021, 8:45 PM Mikhail Dozmorov @.***> wrote:
Something is incomplete. The Homer after HICUP has paired genomic coordinates but not interaction frequencies. What's HICUP output?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dozmorovlab/TADCompare/issues/7#issuecomment-830476150, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEOXGSXFFJNZBCMKPUGSRULTLNFLVANCNFSM435LFAXA .
Is there other file paralleling pairs of genomic coordinates? Interaction frequencies is a critical missing piece.
Definitely, BAM files are not Hi-C matrices. Again, I'm familiar with the Juicer, HiC-Pro, and HiCExplorer pipelines. Not sure what to do with HICUP bam files to extract interaction matrices. You may explorer HiCExplorer and FAN-C pipelines for that, but I would use them at the first place instead of HICUP.
Thanks a lot, since you directed me to the .hic files , I could generate those and I can use it. If not successful, I will poke you again here. Thanks a lot
On Fri, Apr 30, 2021, 9:11 PM Mikhail Dozmorov @.***> wrote:
Is there other file paralleling pairs of genomic coordinates? Interaction frequencies is a critical missing piece.
Definitely, BAM files are not Hi-C matrices. Again, I'm familiar with the Juicer, HiC-Pro, and HiCExplorer pipelines. Not sure what to do with HICUP bam files to extract interaction matrices. You may explorer HiCExplorer and FAN-C pipelines for that, but I would use them at the first place instead of HICUP.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dozmorovlab/TADCompare/issues/7#issuecomment-830480751, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEOXGSTKBOVS44EZFIVEK4LTLNIMNANCNFSM435LFAXA .
I was able to convert HiC files to the input and run the TADCompare, thanks a lot.
Can I use TADCompare for Loops DE analysis as well? How can we set it to detect Loops and does the Differential on them?
Thanks
TADCompare does not distinguish TADs and loops. We call them "domains", it compares domain boundaries. Which will include boundaries of TADs and loops.
Since the input is the same, how do you then distinguish if the call is TAD or Loop? based on Lenght and shape of interactions on the interaction matrix?
It's a general question - how people distinguish TAD and loop boundaries. By length seems to be the most common, smaller (<100kb) may be considered loops, ladger (>100kb & <2Mb) may be TADs.
Hi,
I have n basic question here. I liked the tool a lot but having trouble with the input file generation.
I have the data from HICUP and then I can convert it to homer, Juicer, hicpe, gothic and fithic formats. Is there any way to generate the input from either one of these formats?
It tried to generate the contact matrix using homer tools (analyze HiC) but since the chromosomes are really large, it kills the process and fails. I prefer no to break the chrs into chunks and do one by one, so if there is an easier way, please let me know.
I also called the TAD and Loops using homer and they are in bed format, Anyway that I can use them directly into the pipeline?
Thanks a lot in advance
@mdozmorov @bostanict
Hi, I have the same issue, how I can convert valid read pairs bam file from HICUP pipeline to contact matrix. Could you give me some guide if you know please?
Thanks.
We don't currently use HICUP. If you have just BAM files, https://hicexplorer.readthedocs.io/en/latest/content/tools/hicBuildMatrix.html can process them into .cool matrix files, and then https://hicexplorer.readthedocs.io/en/latest/content/tools/hicConvertFormat.html can extract text matrices. But then, I'd process the data as HiCExplorer recommends https://hicexplorer.readthedocs.io/en/latest/content/example_usage.html#
@mdozmorov Thank you so much! I will let you know after the work!
Hi,
I have n basic question here. I liked the tool a lot but having trouble with the input file generation.
I have the data from HICUP and then I can convert it to homer, Juicer, hicpe, gothic and fithic formats. Is there any way to generate the input from either one of these formats?
It tried to generate the contact matrix using homer tools (analyze HiC) but since the chromosomes are really large, it kills the process and fails. I prefer no to break the chrs into chunks and do one by one, so if there is an easier way, please let me know.
I also called the TAD and Loops using homer and they are in bed format, Anyway that I can use them directly into the pipeline?
Thanks a lot in advance