BlanchetteLab / HIFI

Hi-C Interaction Frequency Inference (HIFI): High-resolution estimation of DNA-DNA interaction frequency from Hi-C data
23 stars 2 forks source link

Is HIFI limited to PE Illumina sequencing data? #9

Open jychoilab opened 3 years ago

jychoilab commented 3 years ago

Hi

Thanks for a great method. I'm working on chromosome contact data generated from a long read sequencing method. Check link here if interested: https://www.biorxiv.org/content/10.1101/833590v1

But basically its pulling down the chromosome contact complex and ligating everything together and sequencing the whole thing on a long read sequencing platform. You can see that this differs quite a bit from Hi-C matrix generated through Illumina sequencing. I'm imagining the bamtosparsematrix.py is where this question really matters but does that conversion from bam to a sparsematrix grounded on a BAM file that is based on PE sequencing data?

Thank you for any advice.

zhyanlin commented 3 years ago

HIFI can be applied on contact maps derived from various experiments theoretically. bamtosparsematrix.py is a preprocessing tool to produce input for HIFI and is developed and tested on a standard Hi-C experiment. You could bypass bamtosparsematrix.py by generating the contact map with any tool you see fit.

jychoilab commented 3 years ago

Thanks for the reply. Could you explain the format that's being written by bamtosparsematrix.py and will be the input for HIFI? Is it related to any other common formats since I have ways to convert my BAM to pairs or hic or coo format.

zhyanlin commented 3 years ago

You can start from the example dataset first. The output of bamtosparsematrix.py (i.e. input for HIFI) is:

# 30521 32940 30522 32941 11 chr9 30521 chr9 30522 ...

The first line is the start and end of the row and col. Additional lines contain contact information. Each line corresponds to a particular contact pair (a,b) at fragment level: counts chrom_a frag_a chrom_b frag_b. If you prefer a fixed bin contact map, you can change frag_a, frag_b to the index of bins.