bxlab / hifive

Tools for handling HiC and 5C data
MIT License
22 stars 8 forks source link

Error when changing resolutions in quasar #16

Closed Jan12938 closed 5 years ago

Jan12938 commented 5 years ago

Hi,

I used quasar to get quality scores for my hic-project with the the resolutions 1 Mb and 40 Kb, which worked fine. When I changed the resolution to 500 Mb and 200 Mb, quasar gave back this error.:

Coverage 12426421 Resolution 200000 Chrom scaffold_12 - Normalizing countsTraceback (most recent call last):
File "/home/jawen108/.local/bin/hifive", line 849, in main() File "/home/jawen108/.local/bin/hifive", line 93, in main run(args) File "/home/jawen108/.local/lib/python2.7/site-packages/hifive/commands/find_quasar_scores.py", line 114, in run coverages=args.coverages, seed=args.seed) File "/home/jawen108/.local/lib/python2.7/site-packages/hifive/quasar.py", line 302, in find_transformation norm, dist, valid_rows = self._normalize(chrom, raw[indices[h]:indices[h + 1]], mids[chrom], res) File "/home/jawen108/.local/lib/python2.7/site-packages/hifive/quasar.py", line 618, in _normalize curr_binsize = mids[1] - mids[0] IndexError: index 1 is out of bounds for axis 0 with size 1

I looked through the code to find out whats wrong. Clearly the mids object cannot be subscripted by a second element.

However, this mids object is created from this hic.fends['fends']['mid'][...] object that is complicated to understand (bellow)

179 else: 180 temp_mids = hic.fends['fends']['mid'][...] 181 chr_indices = hic.fends['chr_indices'][...] .... 209 raw[indices[i]:indices[i + 1], :] = temp 210 mids[chrom] = temp_mids[chr_indices[chrint]:chr_indices[chrint + 1]]

I would be thank full, if somebody could descibe me, what happens with this mids and how changing the resolution has an impact on them.

Thank you for your help in advance.

Jan Wendt

msauria commented 5 years ago

Hi Jan,

mids has two different definitions in the course of the code. Initially, it is the midpoint of each fragment end (fend). Once the resolution is adjusted, it becomes the midpoint of each bin. The error you are encountering is because I used the distance between the first two bin mids to determine the bin size, rather than explicitly passing the value. However, with 500Mb and 200Mb resolutions, most of the chromosomes have only one bin, hence the out of bounds error. This is something that I should fix. However, those resolutions are too low to be meaningful, since quasar only acts on cis interactions for calculating the quality score. I would recommend setting your lower-limit resolution at 2-4Mb.

Jan12938 commented 5 years ago

Thank you for the quick answer. I understand now what created the problem, and how to avoid it.

Best wishes, Jan