ay-lab / mustache

Multi-scale Detection of Chromatin Loops from Hi-C and Micro-C Maps using Scale-Space Representation
MIT License
64 stars 11 forks source link

diff_mustache.py crashing on input file format #25

Closed esebesty closed 2 years ago

esebesty commented 3 years ago

I'm trying to run diff_mustache.py on two hic files with the example command line, and getting the following error:

The distance limit is set to 2000000bp
Traceback (most recent call last):
  File "/home/user2031/work/repos/bcro/bit-bio/mustache/mustache/diff_mustache.py", line 1438, in <module>
    main()
  File "/home/user2031/work/repos/bcro/bit-bio/mustache/mustache/diff_mustache.py", line 1323, in main
    chrs, resolutions, masterindex, genome, metadata = read_header(hic)
  File "/home/user2031/work/repos/bcro/bit-bio/mustache/mustache/diff_mustache.py", line 277, in read_header
    key = readcstr(req)
  File "/home/user2031/work/repos/bcro/bit-bio/mustache/mustache/diff_mustache.py", line 237, in readcstr
    return buf.decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfd in position 0: invalid start byte

Looks like the hic format is wrong. I tried to use the allValidPairs output of HiC-Pro, converted with Juicer. Should I use something else?

ay-lab commented 3 years ago

Hi, thanks for using our tool. As you mentioned it seems like a wrong format. Just to make sure, did you input the .hic file or the validpairs file?

esebesty commented 3 years ago

I used the hic file, converted with juicer_tools.2.10.01.jar.

ay-lab commented 3 years ago

I see, I haven't tried the version 2 yet but I will soon. Do you have any .hic files from older versions of juicer to see if you get a similar error or not ("https://github.com/aidenlab/juicer/wiki/Download")?

aperl0401 commented 3 years ago

Any updates on this by chance? I encounter similar issues using .hic files generated with Juicer 1.5.7 (I had no issues with loop calling on a single sample, and am very interested in this new aspect of the tool!)

roayaei commented 3 years ago

Can you please provide a sample data I can reproduce the error for? The problem is clearly in reading the header.

esebesty commented 3 years ago

The original data coming from HiC-Pro looks like this:

A00489:843:HY7JCDMXX:1:2224:1325:34632  chr1    10013   +       chr2    22058715        +       14215   HIC_chr1_1      HIC_chr2_6344   30      42
A00489:843:HY7JCDMXX:1:1334:7473:27931  chr1    10465   +       chr12   94978   +       56899   HIC_chr1_1      HIC_chr12_12    39      26
A00489:843:HY7JCDMXX:1:2369:30291:14685 chr1    12995   +       chr8    85091089        -       7882    HIC_chr1_1      HIC_chr8_24813  31      42
A00489:843:HY7JCDMXX:1:1110:3658:5885   chr1    13087   -       chr18   535565  -       13299   HIC_chr1_1      HIC_chr18_155   31      42
A00489:843:HY7JCDMXX:1:1320:14841:3302  chr1    13112   -       chr2    114357318       +       25044   HIC_chr1_1      HIC_chr2_32049  31      11
A00489:843:HY7JCDMXX:1:1408:30689:33066 chr1    13113   -       chr6    82583725        +       15259   HIC_chr1_1      HIC_chr6_23792  31      26
A00489:843:HY7JCDMXX:1:2175:20528:7576  chr1    13138   -       chr6    1431871 -       22909   HIC_chr1_1      HIC_chr6_401    31      37
A00489:843:HY7JCDMXX:1:1258:7346:25801  chr1    13241   +       chr5    180667687       +       9283    HIC_chr1_1      HIC_chr5_55021  31      42
A00489:843:HY7JCDMXX:1:1307:18647:9048  chr1    13424   +       chr2    114357293       +       14542   HIC_chr1_1      HIC_chr2_32049  31      11
A00489:843:HY7JCDMXX:1:1168:30644:13761 chr1    13434   +       chr2    70708307        -       6662    HIC_chr1_1      HIC_chr2_20899  31      40
A00489:843:HY7JCDMXX:1:2419:16432:32142 chr1    13435   +       chr1    55495770        +       13363   HIC_chr1_1      HIC_chr1_12772  31      42
A00489:843:HY7JCDMXX:1:2327:26115:18912 chr1    13444   +       chr9    135072969       -       14620   HIC_chr1_1      HIC_chr9_33555  31      42
A00489:843:HY7JCDMXX:1:1412:7907:14779  chr1    13444   +       chr11   429933  -       4128    HIC_chr1_1      HIC_chr11_82    31      40
A00489:843:HY7JCDMXX:1:2222:10655:6793  chr1    13455   +       chr18   3012618 -       8122    HIC_chr1_1      HIC_chr18_903   31      42
A00489:843:HY7JCDMXX:1:2181:3459:1376   chr1    13519   -       chr5    167825867       +       15148   HIC_chr1_1      HIC_chr5_51772  31      42
A00489:843:HY7JCDMXX:1:1253:19126:5682  chr1    13525   -       chr9    3769434 -       17698   HIC_chr1_1      HIC_chr9_1179   31      23
A00489:843:HY7JCDMXX:1:1204:10755:25410 chr1    13532   -       chr15   91093001        -       23248   HIC_chr1_1      HIC_chr15_19616 31      23
A00489:843:HY7JCDMXX:1:2428:2645:28181  chr1    13544   -       chr7    1637248 +       14223   HIC_chr1_1      HIC_chr7_245    31      40

It has no header. I'm using the hicpro2juicebox.sh script to convert the file to hic format but I already had issues with the conversion, when trying to use the restriction fragment size file. See here and here.