Error in exp[as.character(g), ] : incorrect number of dimensions

zeavin-ferguson commented 4 years ago

I am trying to run imsig on some mouse gene expression datasets I have. I get this error:

Although these are mouse datasets, the gene names are in HGNC format. I checked the multisymbol checker here: https://www.genenames.org/tools/multi-symbol-checker/ to ensure that my genes are represented in the HGNC database - 88.6% match approved symbols. There are no duplicate gene names. There are no missing values. Maybe I do not have enough overlap with the Imsig genes? but how can I check to see what the overlap is between the expression data and Imsig? I do not see where these are stored for me to check the overlap. I attached the expression dataset I am trying to run imsig on in case that is helpful. exp_data_for_imsig.txt

zeavin-ferguson commented 4 years ago

I found the list of ImSig genes in the paper. NK cells are not represented in my brain datasets and only 4 of the NK cell markers are in my blood datasets. There is pretty good representation of the rest of the cell types in my datasets but varies based on the cell type.

I also noticed that I get a different error if I restrict the number of rows. The error I get if I only use 500 genes is:

Error in fastCor(t(exp)) : invalid nSplit: 0

ajitjohnson commented 4 years ago

Hi @zeavin-ferguson You should also be able to access the signature by typing sig after loading the package.

Again Error in fastCor(t(exp)) : invalid nSplit: 0 is probably due to either poor overlap between the signature genes and your dataset or duplicate gene names in the dataset.

I should have just accounted for these when I made the package. Unfortunately, I don't have the bandwidth to do it now :(

zeavin-ferguson commented 4 years ago

Hi @ajitjohnson That makes sense then that the error:

Error in fastCor(t(exp)) : invalid nSplit: 0

happens when I restrict the rows of my dataset.

But I checked the overlap and it is pretty good for the whole dataset. However, when I use the whole dataset I am getting the error: Error in exp[as.character(g), ] : incorrect number of dimensions

Any idea what that one is about? I attached my dataset to the first message.

ajitjohnson commented 4 years ago

Hi @zeavin-ferguson I just took a look at your data.

The expression set should not be scaled data. For correlation to work appropriately, it needs to be in natural scale without log transformation (e.g. FPKM/TPM).

Although this might not be the issue that you are facing.

Alternatively, you also simply look at the mean/median expression of all the signature genes without the correlation step.

Here is what you could do.

# Mean Expression Function

imsig <- function(exp,sig){
  # Subset genes that are present in the sig
  exp <- exp[row.names(exp) %in% sig$gene,]
  sig <- sig[sig$gene %in% row.names(exp),]
  # Loop to calculate the average expression of each cell type
  cc <- data.frame(matrix(nrow = ncol(exp)))
  cc <- cc[,-1]
  for (i in unique(sig$cell)){
    s <- sig[sig$cell %in% i,]
    e <- exp[as.character(s$gene),]
    e_avg <- data.frame(colMeans(e, na.rm = TRUE))
    colnames(e_avg) <- i
    cc <- cbind(cc, e_avg)
  }
  return(cc)
}

# Plotting Function

plot_abundance <- function(proportion){
  require(ggplot2)
  require(gridExtra)
  cell <- proportion
  cell$samples <- row.names(cell)
  cell$samples <- factor(cell$samples, levels = cell$samples)
  plots = lapply(1:(ncol(cell)-1), function(x) ggplot(cell, aes(x = cell$samples, y = cell[,x]))
                 + geom_bar(stat = "identity") + theme_classic() +
                   theme(axis.title.x=element_blank(), axis.text.x = element_text(angle = 90, hjust = 1), axis.title.y=element_blank())+
                   ggtitle(colnames(cell)[x]))
  do.call(grid.arrange,  plots)
}

# Load the dataset
exp = read.table('exp_data_for_imsig.txt', header = T, row.names = 1, sep = '\t')
sig <- sig

# Run the function
proportion <- imsig(exp,sig)
plot_abundance (proportion)

ajitjohnson commented 4 years ago

I also notice that you have two cell types with only one gene in the signature and so I recommend doing this-

sig <- sig[!sig$cell %in% c('NK cells',  'Plasma cells'),]
sig <- droplevels(sig)

zeavin-ferguson commented 4 years ago

Thank you @ajitjohnson!! The functions work great - I just had to change V1 and V2 for sig to gene and cell, respectively. I will also try using FPKM or TPM normalized data and using the correlation step to see if that was the issue. I appreciate your help!

ajitjohnson / imsig

Error in exp[as.character(g), ] : incorrect number of dimensions #12

Error in fastCor(t(exp)) : invalid nSplit: 0

Error in fastCor(t(exp)) : invalid nSplit: 0