BlanchetteLab / HIFI

Hi-C Interaction Frequency Inference (HIFI): High-resolution estimation of DNA-DNA interaction frequency from Hi-C data
23 stars 2 forks source link

Starting with sparse matrix as input to HIFI? #2

Closed jstansfield0 closed 5 years ago

jstansfield0 commented 6 years ago

I am wondering if it is possible to start with data in the form of a sparse matrix instead of a BAM file for input into HIFI? Looking at the input for the HIFI function it seems this is possible but based on your example I am not sure what format the sparse matrix needs to be in.

head Rao_GM12878.hg19.chr9_example.chr9_chr9.RF.HIFI_MRF.tsv
# 30521 32940 30522 32941
18543762.207 chr9 30521 chr9 30522
28820294.189 chr9 30521 chr9 30523
20672537.231 chr9 30521 chr9 30524
4884029.770 chr9 30521 chr9 30525
7376499.939 chr9 30521 chr9 30526
1907192.993 chr9 30521 chr9 30527
1937146.378 chr9 30521 chr9 30528
7150893.402 chr9 30521 chr9 30529
3340169.525 chr9 30521 chr9 30530

Is it IF chr1 start1 chr2 start2? Also should the start locations be in base pairs or are these in the form of IDs that are later mapped to basepair locations?

ccameron commented 5 years ago

Yes, the input to HIFI is a sparse matrix file. The Python script is provided to allow users to convert their Hi-C BAM files to sparse matrix files.

Please see the following post for an explanation of the sparse matrix format: https://github.com/BlanchetteLab/HIFI/issues/1