DHUDBlab / scDSC

22 stars 4 forks source link

Question regarding the Pearson correlation calculation #3

Open RemyLau opened 1 year ago

RemyLau commented 1 year ago

Dear author @DHUDBlab, I find your work quite interesting. However, I'm a bit confused about the Pearson correlation computation implemented in your code base:

https://github.com/DHUDBlab/scDSC/blob/1247a63aac17bdfb9cd833e3dbe175c4c92c26be/MTAB/calcu_graph_mtab.py#L29-L32

If I'm not misunderstood, this is inconsistent with the definition of Pearson correlation shown in your paper.

image

In particular,

  1. features are centered by the whole feature matrix. However, if you want the correlation between cells, i.e., rows, they should be row centered.
  2. np.linalg.norm taking 2d array as input without any keyword arguments produces the Frobinius norm of the whole matrix. So the denominator part in your computation is inconsistent with the formula provided.

I suggest the following modification, which faithfully implements the intended Pearson correlation calculation:

normalized_features = (features - features.mean(1, keepdims=True)) / features.std(1, keepdims=True)
s = np.dot(normalized_features, normalized_features.T) / normalized_features.shape[1]

You can check that, indeed, the diagonal of s are all ones (with some error due to machine eps).

lixhere commented 1 year ago

hello,could you please send me the datasets?I saw a lot of data on the website they provided, but I don't know which one they used,thx!!!!!!!!1

handesome commented 1 year ago

me too,have you solved the problem?

handesome commented 1 year ago

hello,could you please send me the datasets?I saw a lot of data on the website they provided, but I don't know which one they used,thx!!!!!!!!1

hi!do you konw how to use the data now?could you give some advice?Thanks