dynverse / anndata

Annotated multivariate observation data in R
https://anndata.dynverse.org
Other
40 stars 3 forks source link

anndata dataframes and matrices #14

Open GreenGilad opened 2 years ago

GreenGilad commented 2 years ago

Hi there,

When storing a matrix with row names and column names in the uns slot, these are removed. I assume that is to align with the python numpy implementation where these are not supported.

To try and work around this problem of loosing the row and column names I can convert the R matrix to a R dataframe. However, when doing so, the anndata object stores these as a pandas DataFrame. I wanted to ask why does the R anndata object stores R dataframe as a pandas DataFrame instead in the R format? Couldn't this be kept transparent to the user only for reading and writing the h5ad object to file but then once loaded to have the class of R dataframe? Currently, every time I wish to use such a dataframe I must use reticulate::py_to_r and I still loose row and column names when doing so.

Couldn't it be the same as anndata$X?

Related to this issue is the case that the matrix contains character values. In this case I am not able to nicely obtain the matrix with the names and in a proper matrix shape. I get it as a flat matrix even if I try to reshape it.

The scenario I am working on is of a square symmetric correlation matrix with the p-values, multiple hypothesis testing corrections matrix and the asterisks matrix.

data$uns$ss.cor <- list(
  names = colnames(data$X),
  corr = stats::cor(data$X, use = "pairwise.complete.obs", method = "spearman"),
  pval = outer(1:ncol(data$X), 1:ncol(data$X), Vectorize(function(i,j)
    cor.test(data$X[,i], data$X[,j], use="pairwise.complete.obs", method = "spearman")[["p.value"]]))
)
data$uns$ss.cor$adj.pval <- matrix(p.adjust(data$uns$ss.cor$pval, method = "BH"), nrow=nrow(data$uns$ss.cor$pval))
data$uns$ss.cor$sig <- matrix(cut(data$uns$ss.cor$adj.pval, c(-.1, 0.001, 0.01, 0.05, Inf), c("***", "**", "*", "")), nrow=nrow(data$uns$ss.cor$pval))
data$uns$ss.cor$params <- list(cor.method = "spearman",
                               cor.use = "pairwise.complete.obs",
                               p.adjust.method = "BH")

To keep it simple I am showing the above using data$X but in reality I am using a matrix of different shape than X and therefore using uns and not varp.

Thanks!

rcannood commented 2 years ago

Hey Gilad!

Could you provide me with a small reproducible example?

I'll think about whether I can find a "nice" solution to your problem while still being compatible with the standard anndata interface.

Robrecht