ay-lab / dcHiC

dcHiC: Differential compartment analysis for Hi-C datasets
MIT License
57 stars 10 forks source link

Normalized counts or raw counts? #48

Closed Nico-FR closed 1 year ago

Nico-FR commented 1 year ago

Hi, I usually use normalized count (between 0 to 1) instead of raw (integer) counts for matrix processing. But for the loop analysis using FitHiC, we must used raw count with a bias files containing the normalization vector.

What do you advice for the compartment analysis with your tool, norm or raw counts ? I think we should use normalized matrices to take into account some biases.

Does the normalized bedgraph (using quantile) allow in a certain way to replace the normalization of matrices? I think it is only useful to compare between samples, right?

ay-lab commented 1 year ago

This is really a good question. We did an internal comparison (not published) between normalized and raw count compartments. We found that raw count reserves the biological features like laminB1 signal the most. So, we decided to go ahead with raw count compartment analysis. In our dcHiC Nat comm paper https://www.nature.com/articles/s41467-022-34626-6 we captured all the relevant biological features using raw counts. It should also be noted that while doing compartment calling we perform a distance normalization followed by correlation calculation on the raw counts and that probably takes out most of the biases from the data.

Quantile normalization is only to make sure there are no between-sample biases.

Nico-FR commented 1 year ago

Thank you for the clear answer.