Closed serine closed 4 years ago
This is a perfect question for the BioC2020 workshop, https://github.com/mdozmorov/HiCcompareWorkshop. I hope you can join, the data format and hic.table
will be discussed here.
As of immediate advice, if you look at all columns (don't use select
), columns would make more sense. See the workshop intro about data formats https://mdozmorov.github.io/HiCcompareWorkshop/articles/hic_tutorial.html, and the HiCcompare vignette itself https://www.bioconductor.org/packages/release/bioc/vignettes/HiCcompare/inst/doc/HiCcompare-vignette.html
@mdozmorov thanks for pointer.. I think I need more time to understand this and perhaps work through your multiHiCcompare
example here
There is something magical about D value (distance off the diagonal) and differences between IF
(M value)... I think understand that hic_table
reports two regions that have high M with respect to D (right?) as "interesting". I just don't get why one region in one sample should have an effect on a different region in another sample..?
I think hic_table
telling me that those two regions in those two samples are "interesting" and I should look at each one of those regions independently. Can I just look at either region1 or region2 and ignore the other?
By the way, yes I'll be joining the conference and quite looking forward to the whole event and your workshop. Your workshop will be around 1 in the morning in my time zone, not too bad actually :D
Basically I'm working towards exactly those few things that you've mentioning in your workshop, overall with genes and promoters and enrichment testing.. This is why I'm asking which regions to use for gene annotation (will work through your examples)
thanks
@mdozmorov I'm not sure why, but it was pretty hard to figure out this
> hic.table %>% filter(p.adj < 0.05) %>% arrange(start1, end1) %>% slice(15)
chr1 start1 end1 chr2 start2 end2 IF1 IF2 D M adj.IF1 adj.IF2 adj.M mc A Z p.value p.adj
1: chr22 48000000 48500000 chr22 48500000 49000000 2183 7019.283 1 1.685012 2269.657 6751.283 1.572687 0.1123242 4510.47 3.285699 0.001017295 0.03359926
> HMEC.chr22 %>% filter(region1 == 48000000, region2 == 48500000)
region1 region2 IF
1 48000000 48500000 2183
> NHEK.chr22 %>% filter(region1 == 48000000, region2 == 48500000)
region1 region2 IF
1 48000000 48500000 8094
I think I finally get hic_table
, this region 48000000-48500000
is differential between two cell lines.
I'm still a little confused as to why end2 == 49000000
I'd have excepted it to be end2 == 48500000
..? but I guess this isn't as important, thanks
Hi there,
for some reason I struggling to interpret
hic_compare()
results table.. If we just look at your example of these two cell lineswhat we have are two different cell lines and we are trying to figure out which regions of the chromosome are different right? I'm used to working with expression data where interpretation is "straightforward" - a gene has gone up or down relative to another sample (base line). However looking at
hic.table
I'm not sure what the interpretation should be. Below I'm showing three "differential regions" from example data set and I don't know whether I should interpret this as region1 (start1:end1) is "very" different in terms of number of contacts to region2 (start2:end2) ? So looking at the second line below, region1 (19000000:19500000) in HMEC cell line is very different to region2 (48000000:48500000) in a different NHEK cell line.. but those are two region that are 29Mb apart in different cell lines? I'm not sure why should you expect those two regions to be the same in different cell lines? I guess the results table that I was expecting is a single column of regions that are different in NHEK cell line relative to baseline (HMEC cell line)? I'm sure that I'm just not getting Hi-C data yet and would appreciate some help here, thanks