MaayanLab / sigcom-lincs

Signature Commons LINCS Repo
3 stars 3 forks source link

Clarifying the values in the file: xpr_coeff_mat.gctx "LINCS L1000 CRISPR Perturbations (2021)" #64

Closed shkao closed 1 year ago

shkao commented 1 year ago

Hi,

From the download page, I found "LINCS L1000 CRISPR Perturbations (2021)" (xpr_coeff_mat.gctx). Presumably, the values are log2 fold change for 12,327 genes (landmark + inferred genes) between CRISPR KO and wild-type, or Z-scores?

I got this question because the log2 fold change values for the KO genes are not close to or smaller than, let's say, -1. For example, I would expect that the log2 fold change of gene MYC in the signature HAHN001_A549_96H_A10_MYC (from xpr_coeff_mat.gctx) would be something like log2(KO / wild-type) = log2(0.5) = -1, however, it is even greater than 0:

> data <- parse_gctx("xpr_coeff_mat.gctx")
reading xpr_coeff_mat.gctx
> df_exp <- as.data.frame(mat(data))
> df_exp["MYC", "HAHN001_A549_96H_A10_MYC"]
[1] 0.00477146

Then I checked the distributions of the abovementioned values (log2 fold change of the gene within that same gene KO signature), and got this: image Does it mean that most of the KO genes are not really knocked out? And even being knocked out, the log2FC values are no smaller than -0.1?

jeevangelista commented 1 year ago

Hi @shkao, the values in the matrix are characteristic direction coefficients

shkao commented 1 year ago

Hi @jeevangelista, thanks a lot! How do you interpret the CD coefficient for gene MYC in the MYC-KO signature HAHN001_A549_96H_A10_MYC being 0.00477146?