YaohuiZeng / grpregOverlap

Regularization paths of linear, logistic, Poisson, or Cox models with overlapping grouped covariates
19 stars 6 forks source link

Error in `dimnamesGets` when I calling `grpregOverlap` function #8

Open t2ag3 opened 1 year ago

t2ag3 commented 1 year ago

Hi! I'm trying to perform grpregOverlap. But, the code in below didn't work for me.

## same, but now X is a matrix
n <- 10
p <- 3
X <- matrix(rnorm(n*p), n, p)
colnames(X) <- c("gene1","gene2","gene3")
group <- list(
  "pathway1" = c("gene1", "gene2"),
  "pathway2" = c("gene2", "gene3")
)
y <- rnorm(10)

## ERROR below code
## ~~fitting works~~
fm <- grpregOverlap(X, y, group, returnX.latent = TRUE)

Originally posted by @dankessler in https://github.com/YaohuiZeng/grpregOverlap/issues/7#issuecomment-1104120172

It return below error message.

Error in dimnamesGets(x, value) : 
  length of Dimnames[[2]] (3) is not equal to Dim[2] (2)

Could you please check this? Thanks!

LamineTourelab commented 1 year ago

Hi the problem is from the ExpandX function in the grpregOverlap R. The line dimnames.

expandX <- function(X, group) {
  incidence.mat <- incidenceMatrix(X, group) # group membership incidence matrix
  over.mat <- Matrix(incidence.mat %*% t(incidence.mat), sparse = TRUE, 
                     dimnames = dimnames(incidence.mat)) # overlap matrix
  grp.vec <- rep(1:nrow(over.mat), times = diag(over.mat)) # group index vector

  # expand X to X.latent
  X.latent <- NULL
  names <- NULL

  ## the following code will automatically remove variables not included in 'group'
  for(i in 1:nrow(incidence.mat)) {
    idx <- incidence.mat[i,]==1
    X.latent <- cbind(X.latent, X[, idx, drop=FALSE])
    names <- c(names, colnames(incidence.mat)[idx])
#     colnames(X.latent) <- c(colnames(X.latent), colnames(X)[incidence.mat[i,]==1])
  }
  colnames(X.latent) <- paste('grp', grp.vec, '_', names, sep = "")
  X.latent
}

I just comment this line, here is the code:

expandX <- function(X, group) {
  incidence.mat <- incidenceMatrix(X, group) # group membership incidence matrix
  over.mat <- Matrix(incidence.mat %*% t(incidence.mat), sparse = TRUE) 
                    # dimnames = dimnames(incidence.mat)) # overlap matrix
  grp.vec <- rep(1:nrow(over.mat), times = diag(over.mat)) # group index vector

  # expand X to X.latent
  X.latent <- NULL
  names <- NULL

  ## the following code will automatically remove variables not included in 'group'
  for(i in 1:nrow(incidence.mat)) {
    idx <- incidence.mat[i,]==1
    X.latent <- cbind(X.latent, X[, idx, drop=FALSE])
    names <- c(names, colnames(incidence.mat)[idx])
#     colnames(X.latent) <- c(colnames(X.latent), colnames(X)[incidence.mat[i,]==1])
  }
  colnames(X.latent) <- paste('grp', grp.vec, '_', names, sep = "")
  X.latent
}

You can try this, it may be help. Thanks

t2ag3 commented 1 year ago

Wow! It works! You're my Hero! Thanks a lot!

By the way, does it affect for the result? If it doesn't, could you fix the code on master branch of this repository?

LamineTourelab commented 1 year ago

Hi, thank great see that works!

That's the main question I am asking. The Overlap matrix indicate the number of overlaps between groups, it's dimnames are composite by group differently in Incidendence matrix the dimnames are group and variables. So the dimension is not the same as in the error. For there is no problem in the results so I think it's good. I will propose a pull request to see.

t2ag3 commented 1 year ago

Thanks for the response and the pull request! I'm really impressed at how you get things done really fast! I agree with your point. That's an astute opinion, I think. The overlap matrix and the incidence matrix would not have the same dimensions in almost all cases. I'm hoping your pull request will be accepted quickly, because this package and functions would be useful for bioinformaticians with omics data like me! Thank you very much for your kindness.