aertslab / GENIE3

GENIE3 (GEne Network Inference with Ensemble of trees) R-package
26 stars 10 forks source link

getLinkList does NOT sort the importance correctly. #6

Closed ccshao closed 5 years ago

ccshao commented 6 years ago

It seems getLinkList could not work properly with the example codes

library(GENIE3)
set.seed(123)
exprMat <- matrix(sample(1:10, 100, replace=TRUE), nrow=20)
rownames(exprMat) <- paste("Gene", 1:20, sep="")
colnames(exprMat) <- paste("Sample", 1:5, sep="")
weightMat <- GENIE3(exprMat, regulators=paste("Gene", 1:5, sep=""))
linkList <- getLinkList(weightMat)
#- my function to get the link list
source("1.txt")
ll2 <- get.link.list.2(weightMat)
linkList[, 3] - ll2[, 3]
linkList[36:40,]
ll2[36:40,]
> weightMat
          Gene1     Gene10     Gene11     Gene12     Gene13     Gene14
Gene1 0.0000000 0.36186918 0.16918191 0.33108669 0.10656398 0.07530360
Gene2 0.1392679 0.22527890 0.07528511 0.07692856 0.26210701 0.35274037
Gene3 0.3331730 0.05667154 0.04212907 0.27510784 0.03422291 0.11565126
Gene4 0.2671128 0.27614697 0.19238345 0.25202215 0.39066984 0.42727314
Gene5 0.2604462 0.08003340 0.52102045 0.06485476 0.20643626 0.02903163
         Gene15     Gene16     Gene17     Gene18     Gene19      Gene2
Gene1 0.1427551 0.03535717 0.20557081 0.12705088 0.40529361 0.09381347
Gene2 0.1909348 0.34588125 0.26966089 0.34407123 0.13017368 0.00000000
Gene3 0.1022015 0.03214449 0.09569742 0.05557983 0.03285738 0.35117874
Gene4 0.4511947 0.27590408 0.28452077 0.44896797 0.15668635 0.49396928
Gene5 0.1129139 0.31071301 0.14455010 0.02433009 0.27498899 0.06103851
          Gene20      Gene3      Gene4     Gene5     Gene6      Gene7     Gene8
Gene1 0.20431987 0.35315052 0.15667685 0.2868659 0.3017547 0.36438042 0.4228414
Gene2 0.38735334 0.14671994 0.55507937 0.2623781 0.1563841 0.17148157 0.1102381
Gene3 0.05552606 0.00000000 0.20530120 0.1228994 0.2258975 0.09387827 0.1317118
Gene4 0.14733463 0.44311875 0.00000000 0.3278567 0.1769851 0.25661764 0.1859770
Gene5 0.20546611 0.05701079 0.08294258 0.0000000 0.1389786 0.11364211 0.1492318
          Gene9
Gene1 0.3063666
Gene2 0.1123121
Gene3 0.2034257
Gene4 0.2535187
Gene5 0.1243769

Gene4 Gene12 0.2520221 is larger than Gene3 Gene6 0.2258975 but ignored.

My function

get.link.list.2 <- function(weight.matrix, report.max = NULL) {
  m.names <- sapply(colnames(weight.matrix), function(X) paste(X, rownames(weight.matrix), sep = "="))
  sorted.indices <- order(weight.matrix, decreasing=TRUE)
  df <- data.frame(name = m.names[sorted.indices],
                   value = weight.matrix[sorted.indices])
  index.names <- t(sapply(as.character(df[, 1]),
                          function(X) unlist(strsplit(X, "="))))
  link.list <- data.frame(from.gene = index.names[, 2],
                          to.gene = index.names[, 1],
                          im = df[, 2], stringsAsFactors=FALSE)
  link.list.2 <- link.list[link.list[, 1] != link.list[, 2], ]
  rownames(link.list.2) <- NULL
  link.list.2
}

However, it seems to work fine with a square weigth matrix.

s-aibar commented 6 years ago

The function getLinkList converts the diagonal of the matrix to NA. This seems to work in most cases, but indeed, if the rows/columns are not in the same order, it will not be correct! Thank you for the bug report, I will correct it ASAP.

Thank you!