Closed lauhinojosa closed 1 year ago
Thanks a lot for raising the issue and providing the solution.
I was still getting this error (chrchr1) after making these changes to the code at lines 154, 175. I found that changing line 131 resolved the problem:
chrTag = "chr" + chrom.name into chrTag = chrom.name
Hello,
When using utilities/preprocess.py to format .hic files, I ran into the following issue:
Description of the Issue
python preprocess.py -input hic -file sample1.hic -res 1000 -prefix preprocessed/sample1 -removeChr chrm,chry
The output was:
Additionally, the output bed file did not have the correct format.
Solution
The incorrect bed format occurs because the script appends 'chr' to the chromosome name, expecting the chromosome names in the .hic file to be '1,2,3...etc' instead of 'chr1,chr2... etc'. I solved this by changing line 154 to:
bedfile.write(chrs[a] + "\t" + (str(posIterator-res)) + "\t" + str(posIterator) + "\t" + str(iterator) + "\n") #removed 'chr'
Likewise, the not found error occurs because the script uses the chromosome number (and not the chromosome name) in the function hicstraw.straw() in line 175. Or, in this case, the empty str that results from
chrNum = chr.split("chr")[1]
in line 173.I changed line 175 to:
result = hicstraw.straw("observed", 'NONE', results.file, chr, chr, 'BP', res)
If anyone else is having this issue, these 2 changes suffice to solve it. A future version of this script should consider different naming schemes.
Thank you,
Laura Hinojosa