BlanchetteLab / HIFI

Hi-C Interaction Frequency Inference (HIFI): High-resolution estimation of DNA-DNA interaction frequency from Hi-C data
23 stars 2 forks source link

Little question about the output of BAMtoSparseMatrix.py #1

Closed Linhua-Sun closed 5 years ago

Linhua-Sun commented 6 years ago

Hi, I want to get raw sparse Hi-C matrix format from bam file using BAMtoSparseMatrix.py. But I am confused with the output file. Are there any detailed explanation about each column in the file. Such as which column is the index and which column is the count.

head Col22_Lib1_Lane1_TAIR10_chr.bwt2pairs.Chr5_Chr5.RF.tsv
# 0 97499 0 97499
41  Chr5    0   Chr5    0
30  Chr5    0   Chr5    1
6   Chr5    0   Chr5    2
1   Chr5    0   Chr5    4
2   Chr5    0   Chr5    6
1   Chr5    0   Chr5    7
1   Chr5    0   Chr5    9
1   Chr5    0   Chr5    10
3   Chr5    0   Chr5    12

Thank you!

ccameron commented 5 years ago

Hi Linhua-Sun,

The sparse matrix format is as follows:

# min_row_frag max_row_frag min_col_frag max_col_frag frequency row_chrom row_frag col_chrom col_frag

Where each non-header line describes the interaction frequency ('frequency') between two fragments ('row_frag' and 'col_frag'). Each fragment is indexed by the zero-based ordering along its respective chromosome ('row_chrom' and 'col_chrom') as found in the expected digest file (see 'examples/hg19.HindIII_fragments.bed'). The header line (starts with '#') describes the dimensionality of the interaction matrix contained within the sparse matrix file. Within the header line, the minimum and maximum fragment indices for the rows ('min/max_row_frag') and columns ('min/max_col_frag') found.