Winnie09 / Lamian

39 stars 9 forks source link

Trouble running Lamian using UMAP embeddings #31

Open DavidJDeGeorge opened 2 weeks ago

DavidJDeGeorge commented 2 weeks ago

Firstly, thank you for putting together such a useful tool!

Apologies in advance if this is a naive issue as I am relatively new to R and scRNAseq analysis. I am trying to use Lamian to compare pseudotime on a single cell dataset derived from two mouse models as part of my PhD. I have used Harmony to integrate my separate samples and am working with a Seurat object.

Following the online examples for Lamian, I have managed to put together the following code that allows me to create the equivalent of "res" in the online example and to evaluate branch uncertainty when I use the Harmony embeddings - this is as far as I have gone so far so this seems to work.

umap_coords <- Embeddings(seurat_obj, reduction = "umap")
umap_matrix <- as.matrix(umap_coords)

log_norm_data <- as.matrix(seurat_obj@assays[["RNA"]]@data)

metadata <- seurat_obj@meta.data
genotype_info <- metadata[["genotype"]]

cellanno <- data.frame(
  cell = rownames(metadata),
  sample = genotype_info
)

lamian_input <- list(
  pca = umap_matrix,
  expression = log_norm_data,
  cellanno = cellanno
)

set.seed(12345)
res = infer_tree_structure(pca = lamian_input[['pca']],
                           expression = lamian_input[['expression']],
                           cellanno = lamian_input[['cellanno']],
                           origin.marker = c('Myb'),
                           number.cluster = 10,
                           xlab='Principal component 1',
                           ylab = 'Principal component 2')

Lamian::plotmclust(res, cell_point_size = 0.5,
                   x.lab = 'Principal component 1',
                   y.lab = 'Principal component 2')

result <- evaluate_uncertainty(res, n.permute=100)

However, I want to visualise branches in the UMAP space that I have used for the rest of my analysis. When I change my input to instead include the UMAP embeddings (I have done this by simply replacing "harmony" in the above code with "umap" for now, I am able to generate the "res" equivalent and generate a plot with branches overlaid on my UMAP space. Except now when I try to evaluate uncertainty I receive the following error message:

Error in branchcomb[tmpcombid, 2] : subscript out of bounds

As far as I can tell, the structures of the two "res" objects (when either using "umap" embeddings or "harmony" embeddings are quite similar). I am possibly being too naive and it might just not be possible to run this on UMAP embeddings, however, it would be good to know a way of showing the branching in the context of something more relatable to the rest of my analysis, regardless.

Thank you for any assistance or clarifications that can be provided.

Alexis-Varin commented 2 weeks ago

Hi, I am using UMAP embeddings and passing evaluate_uncertainty() fine so you can definitely use them. I am not quite sure what you mean by "harmony embeddings", harmony isn't a low dimensional space like UMAP, you only have 3 choices, PCA, t-SNE, or UMAP. Embeddings(seurat_obj, reduction = "harmony") will work because Seurat 5 now stores the integration results as corrected PCA coordinates instead of the integrated Assay before but these reductions should not be used for any downstream analysis, only for UMAP calculation.

Can you do c(ncol(log_norm_data), length(umap_matrix)/2, length(umap_coords)/2, length(cellanno$cell))

You should obtain 4 times the same number.