gabrielakinker / CCLE_heterogeneity

52 stars 15 forks source link

possible bug in the code at `sapply(meta_sig_tumor_all, function(x) apply(nmf_programs_sig_ccle[,unlist(nmf_meta_programs_ccle)], 2, function(y) length(intersect(x,y))/length(union(x,y))))`? #3

Closed smk5g5 closed 2 years ago

smk5g5 commented 2 years ago

Is there something wrong with :-

indprog_jaccard_vivovitro <- sapply(meta_sig_tumor_all, function(x) apply(nmf_programs_sig_ccle[,unlist(nmf_meta_programs_ccle)], 2, function(y) length(intersect(x,y))/length(union(x,y))))

whenever I run it I get incorrect number of dimensions error

In my case I am comparing modules from my snRNA seq and scRNAseq datasets

My nmf_meta_sig_snrna and nmf_meta_sig_scrna are list objects which have tables of genes for example :-

class(nmf_meta_sig_scrna$meta1)
[1] "table"

> (nmf_meta_sig_scrna$meta1)
    CRLF1      NGFR    PMEPA1    COL8A1     GAP43     GFRA2      NEFL    PDLIM7
1.0000000 1.0000000 1.0000000 0.8571429 0.8571429 0.8571429 0.8571429 0.8571429
    RUNX2     SESN3     TGFBI      CCN3      ECM1    IGFBP5       NES     PAPPA
0.8571429 0.8571429 0.8571429 0.7142857 0.7142857 0.7142857 0.7142857 0.7142857
   PLXND1  SERPINE1     ABCB4   ANGPTL2    COL1A1  CRISPLD2     KCNE4      MT1X
0.7142857 0.7142857 0.5714286 0.5714286 0.5714286 0.5714286 0.5714286 0.5714286
    PDGFC    RASAL2      RBP1     TENM3   TMEM59L   TRABD2A   COL18A1    COL7A1
0.5714286 0.5714286 0.5714286 0.5714286 0.5714286 0.5714286 0.4285714 0.4285714
   DCBLD2    FAM20C   IFITM10     ITGA2   OLFML2A    SEMA3C     SPHK1     SYT10
0.4285714 0.4285714 0.4285714 0.4285714 0.4285714 0.4285714 0.4285714 0.4285714
   TMEM47     TRUB2     ACSL4  ADAMTSL1    ADGRG1     CADM3      CCN5  CDC42EP1
0.4285714 0.4285714 0.2857143 0.2857143 0.2857143 0.2857143 0.2857143 0.2857143
 CDC42EP3      COQ4      CTSC     CYTOR      ETS1    FBXO32     FXYD5    GNPTAB
0.2857143 0.2857143 0.2857143 0.2857143 0.2857143 0.2857143 0.2857143 0.2857143
    ITGA3     MEF2C      NEFM      NNMT       OAF     PLAUR       PTN     SYTL2
0.2857143 0.2857143 0.2857143 0.2857143 0.2857143 0.2857143 0.2857143 0.2857143
TNFRSF12A
0.2857143
gabrielakinker commented 2 years ago

Hi Saad,

I think that in your case the easiest way to compare the gene signatures using jaccard index would be:

sCrnaseq signatures:

sCrnaseq <- list(meta1_sC = table(rep(paste0("Gene_", letters[1:10]), 1:10)),
                 meta2_sC = table(rep(paste0("Gene_", letters[11:15]), 1:5)))

sCrnaseq
$meta1_sC
Gene_a Gene_b Gene_c Gene_d Gene_e Gene_f Gene_g Gene_h Gene_i Gene_j 
     1      2      3      4      5      6      7      8      9     10 
$meta2_sC
Gene_k Gene_l Gene_m Gene_n Gene_o 
     1      2      3      4      5 

sNrnaseq signatures:

sNrnaseq <- list(meta1_sN= table(rep(paste0("Gene_", letters[5:14]), 1:10)),
                  meta2_sN = table(rep(paste0("Gene_", letters[12:16]), 1:5)))

sNrnaseq
$meta1_sN
Gene_e Gene_f Gene_g Gene_h Gene_i Gene_j Gene_k Gene_l Gene_m Gene_n 
     1      2      3      4      5      6      7      8      9     10 
$meta2_sN
Gene_l Gene_m Gene_n Gene_o Gene_p 
     1      2      3      4      5 

Calculate Jaccard index:

jaccard_index <- sapply(sCrnaseq, function(x) {
   sapply(sNrnaseq, function(y) {
     length(intersect(names(x),names(y)))/length(union(names(x), names(y)))
   })
 })

jaccard_index
          meta1_sC  meta2_sC
meta1_sN 0.4285714 0.3636364
meta2_sN 0.0000000 0.6666667