PhanstielLab / plotgardener

https://phanstiellab.github.io/plotgardener/
Other
302 stars 28 forks source link

readHiC observed counts calculation #88

Closed chualec closed 1 year ago

chualec commented 1 year ago

Hi,

I was wondering why there are so many decimals in the matrix when I use the readHiC function. Even when i set the option for "matrix = "observed"". Shouldn't the values be whole numbers reflecting the number of reads between the region? This is especially evident for interchromosomal regions where the function results in counts ranging around 0~1, even though I can see using Juicebox that the region has areas where the observed values is 10 or more. Could the column have been normalized by some sort of ratio? and is it possible to turn off the normalization so i can actually see the raw counts between regions?

Thanks

nekramer commented 1 year ago

Hi,

Unfortunately this is a bit out of the scope of plotgardener. readHic is a modified wrapper of the straw() function from strawr (https://github.com/aidenlab/straw/tree/master/R). I would look at the documentation or source code here for the way they implement the different types of matrices you can pull, as I'm not entirely sure what the numbers should look like for all combinations of matrices and normalizations. However, by default, readHic is indeed using straw to extract the KR-normalized counts. So yes, you are correct, they are being normalized. If you would like to see them without normalization, you can specify norm = "NONE".

Best, Nicole

chualec commented 1 year ago

Hi Nicole,

Thanks, this was exactly what I was looking for.