Closed jychoilab closed 4 years ago
Hi, Please see my responses below. This should be quite straightforward given that you seem to have all the right fields in hand already.
On Sun, Feb 9, 2020 at 10:49 AM cjy8709 notifications@github.com wrote:
Dear fithic developers
I had a question on how to basically convert a cooler (.cool) or pairix format files into a fithic input format. Main reason I have a cool format (or pairix format) is because I'm working on an experimental long read based Hi-C procedure, hence standard Hi-C pipelines like HiC Pro is not relavent for me and I dont think I can use the HiCPro2FitHiC scripts provided by the fithic package. Perhaps I could ask some questions for clarifications so that it can help people with cool files and want to use fithic?
So basically what Im trying to do is change the .cool file into the interactions and fragments file for fithic. It seems like for the interactions file can be created with cooler dump function, specifically
cooler dump --join fubar.cool
which will give you a 7 column "matrix" where first three column represents bin_i_chromosome bin_i_start bin_i_end, and the next three gives you the interacting bin, and the last column is contact count between the two bins. So basically if you average the start and end of each bin I think it would correspond to fragmentMid1 & fragmentMid2 of the interaction file. My first question is, it seems like you don't count the diagonal cells in the interaction file? In other words the contact count for bin_i vs. bin_i should not be in the interaction file?
Exactly, you can simply create fragment midpoints by taking the mid point of start and end for each locus in each line. Honestly, if you want simply pick the start or the end, as long as what you have in interactionCounts file matches the fragments file that should all be fine. Yes, we do not expect the diagonal cells in the file.
For the fragments file I think the 2nd and 5th column would hold some dummy values (as it doesnt matter what they are?) and I dont think its easily found with a cool file. So the column marginalizedContactCount is most important to fill, and based on the description it seems like its just a summation of the entire contacts for a given bin? So if one had a N x N matrix representing the Hi-C matrix, the marginalizedContactCount can just be a summation of each row?
Correct, some columns are dummy values and just the summation of each row would be fine for that specific field. And that marginalizedContactCount (is it really what we call it? sounds weird :)) field is only relevant when and if you do some filtering out of fragments so you can even simply set it to 1 for all loci you want to consider and 0 for all else you want to discard and that should be fine.
Thank you for the help.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ay-lab/fithic/issues/26?email_source=notifications&email_token=AEJTNJGELEGFNN2M4UVCVVDRCBF2PA5CNFSM4KSDRPQ2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IMCZ7VQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJTNJHIEPNJAYC27GOF73LRCBF2PANCNFSM4KSDRPQQ .
Cool really appreciate the quick reply!
Dear fithic developers
I had a question on how to basically convert a cooler (.cool) or pairix format files into a fithic input format. Main reason I have a cool format (or pairix format) is because I'm working on an experimental long read based Hi-C procedure, hence standard Hi-C pipelines like HiC Pro is not relavent for me and I dont think I can use the HiCPro2FitHiC scripts provided by the fithic package. Perhaps I could ask some questions for clarifications so that it can help people with cool files and want to use fithic?
So basically what Im trying to do is change the .cool file into the interactions and fragments file for fithic. It seems like for the interactions file can be created with cooler dump function, specifically
cooler dump --join fubar.cool
which will give you a 7 column "matrix" where first three column represents bin_i_chromosome bin_i_start bin_i_end, and the next three gives you the interacting bin, and the last column is contact count between the two bins. So basically if you average the start and end of each bin I think it would correspond to fragmentMid1 & fragmentMid2 of the interaction file. My first question is, it seems like you don't count the diagonal cells in the interaction file? In other words the contact count for bin_i vs. bin_i should not be in the interaction file?
For the fragments file I think the 2nd and 5th column would hold some dummy values (as it doesnt matter what they are?) and I dont think its easily found with a cool file. So the column marginalizedContactCount is most important to fill, and based on the description it seems like its just a summation of the entire contacts for a given bin? So if one had a N x N matrix representing the Hi-C matrix, the marginalizedContactCount can just be a summation of each row?
Thank you for the help.