Closed dawe closed 3 years ago
Hi,
thanks for testing this and giving me feedback. I will have a look at this and come back to you.
Best,
Joachim
As a fast issue to test: Can you run it with one thread instead of multiple ones? Could be a parallelization issue if that solves it.
I've tried to use one thread, it does something but... A step back: looking at the TAD IDs, using the multithreaded version I have these TADs skipped:
1 246700000 247700000 ID_0.01_168
2 1800000 3800000 ID_0.01_169
2 241100000 241800000 ID_0.01_339
3 3300000 4300000 ID_0.01_340
3 194100000 195700000 ID_0.01_481
4 1800000 3000000 ID_0.01_482
4 188500000 189400000 ID_0.01_618
5 1700000 3700000 ID_0.01_619
5 3700000 4900000 ID_0.01_620
5 177600000 179000000 ID_0.01_747
6 1500000 2700000 ID_0.01_748
6 2700000 4000000 ID_0.01_749
6 168500000 169300000 ID_0.01_871
7 1600000 2500000 ID_0.01_872
7 2500000 4600000 ID_0.01_873
7 155600000 157200000 ID_0.01_990
X 3800000 7900000 ID_0.01_991
X 7900000 8800000 ID_0.01_992
X 153000000 153800000 ID_0.01_1093
8 2000000 3200000 ID_0.01_1094
8 3200000 4900000 ID_0.01_1095
8 143700000 144400000 ID_0.01_1206
9 1000000 2100000 ID_0.01_1207
9 2100000 2900000 ID_0.01_1208
9 138500000 139600000 ID_0.01_1282
10 1400000 3200000 ID_0.01_1283
10 3200000 4800000 ID_0.01_1284
10 132300000 133700000 ID_0.01_1380
11 2200000 4200000 ID_0.01_1381
11 4200000 5100000 ID_0.01_1382
11 132900000 133800000 ID_0.01_1482
12 2100000 3200000 ID_0.01_1483
12 3200000 4000000 ID_0.01_1484
13 20700000 22200000 ID_0.01_1593
13 22200000 23300000 ID_0.01_1594
14 20200000 22100000 ID_0.01_1665
14 22100000 22700000 ID_0.01_1666
15 22600000 23900000 ID_0.01_1730
15 23900000 25800000 ID_0.01_1731
16 2000000 3100000 ID_0.01_1795
16 3100000 4400000 ID_0.01_1796
17 1000000 2300000 ID_0.01_1855
17 2300000 3500000 ID_0.01_1856
18 2600000 4000000 ID_0.01_1915
18 4000000 5000000 ID_0.01_1916
20 1500000 2700000 ID_0.01_1969
20 2700000 3700000 ID_0.01_1970
19 2700000 4700000 ID_0.01_2010
19 4700000 5700000 ID_0.01_2011
22 17400000 18800000 ID_0.01_2048
22 18800000 20200000 ID_0.01_2049
21 15600000 17000000 ID_0.01_2070
21 17000000 18700000 ID_0.01_2071
As you can see, the last of each chromosome and the next TAD (the first on the next chromosome) are skipped. The first TAD of chromosome 1, however, is in. Switching to a single thread (-t 1
) solves the "last TAD" but not the "first TAD":
2 1800000 3800000 ID_0.01_169
3 3300000 4300000 ID_0.01_340
4 1800000 3000000 ID_0.01_482
5 1700000 3700000 ID_0.01_619
6 1500000 2700000 ID_0.01_748
7 1600000 2500000 ID_0.01_872
X 3800000 7900000 ID_0.01_991
8 2000000 3200000 ID_0.01_1094
9 1000000 2100000 ID_0.01_1207
10 1400000 3200000 ID_0.01_1283
11 2200000 4200000 ID_0.01_1381
12 2100000 3200000 ID_0.01_1483
13 20700000 22200000 ID_0.01_1593
14 20200000 22100000 ID_0.01_1665
15 22600000 23900000 ID_0.01_1730
16 2000000 3100000 ID_0.01_1795
17 1000000 2300000 ID_0.01_1855
18 2600000 4000000 ID_0.01_1915
20 1500000 2700000 ID_0.01_1969
19 2700000 4700000 ID_0.01_2010
22 17400000 18800000 ID_0.01_2048
21 15600000 17000000 ID_0.01_2070
A quick workaround: I padded TADs adding the regions from 0 to first TAD and from last TAD to end of chromosome, for each chromosome. Then I ran hicDifferentialTAD
in single thread mode:
$ bedtools complement -g ${GENOME}.tab -i ${DOMAINS} | cat -n | awk '{OFS="\t"; print $2, $3, $4, "ID_COMP_"$1, 0, "."}' > TAD_Complement.bed
$ cat TAD_Complement.bed ${DOMAINS} | cut -f 1-6 | bedtools sort > TAD_PAD.bed
then used TAD_PAD.bed
as argument to -td
flag
Ok, this means we have somewhere a plus one
issue checking the boundary conditions.
Could #722 be related to this? I hope so :-)
This bug could be fixed with the changes in the current develop branch. Can you check this @dawe ? Thanks a lot!
Hi all, I'm running
hicDifferentialTAD
on TADs identified byhicFindTADs
on my data. I've noticed that some domains are not included in the accepted nor in the rejected list:TADs were computed like this
and differences like this
Looking at the position of the skipped TAD in the domains file, I realized they are the first and last TAD of each chromosome, which may make sense when testing for left or right inter-TAD differences, but not for intra-TAD difference.
Of note, I've tried to run
hicDifferentialTAD
with-m intra-TAD
option but I get weird results, only two domains are tested (the first 2 on chromosome 1) and this test is repeated hundreds of times.