broadinstitute / infercnv

Inferring CNV from Single-Cell RNA-Seq
Other
566 stars 166 forks source link

Heatmap vs run.final.infercnv_obj@expr_data #529

Open jemorlanes opened 1 year ago

jemorlanes commented 1 year ago

Hello!

I have a quick question regarding some of the outputs of inferCNV. I see that the adjusted expression values from the final heatmap are different from the expression values that I find in run.final.infercnv_obj@expr_data.

Why is there a difference between these 2? The max and min of expr_data are greater than those shown in the heatmap. Where does this difference come from?

Thank you for your time! You have created an amazing package :)

GeorgescuC commented 1 year ago

Hi @jemorlanes ,

Adapting my answer from https://github.com/broadinstitute/infercnv/issues/308#issuecomment-1432245455, and will add it to the wiki FAQ since its a sensible question.

When running plot_cnv(), there is an option called x.range that limits the min/max values that are plotted to avoid outliers skewing the color breaks and toning down all color intensities. By default, it is set to "auto", which sets the min/max limits to respectively the 0.01 and 0.99 quantiles of the values in cnv@expr.matrix, setting the outliers to those limit values and effectively reducing the range() of the values in the .txt files.

Regards, Christophe.

jemorlanes commented 1 year ago

I see! That makes a lot of sense :) On the same note however, I notice that on my heatmap I get a max value of 1.3 in the colour bar, but when I check the 99th quantile of my data, the value is 1.24.

`> quantile(st.cnv@expr.data, prob = 0.99) 99% 1.238855

quantile(st.cnv@expr.data, prob = 0.01) 1% 0.815113 `

I would understand the 99th quantile by rounding up, but the 1st has me a bit confused.

GeorgescuC commented 1 year ago

Hi @jemorlanes ,

The color scale range is symmetrical based on the higher of the 2 deltas to the center, so the range used is effectively [0.761145 ; 1.238855]. You can check the detail of the code here.

The histogram does seem to "bleed" a bit below 0.8, not sure why. The histogram density code is from the deprecated GMD library, not something we wrote. One way to have more insight on the exact values used (because sometimes R approximates things weirdly) would be to add the debug=TRUE option to run(). That should output the exact values used for the breaks and range.

Regards, Christophe.

jemorlanes commented 1 year ago

Ah I see! I didn't notice that the color scale was symmetrical. Thank you so much Christophe, really appreciate it :))