cole-trapnell-lab / monocle3

Other
340 stars 102 forks source link

Suggestion to use Group information as the root and end of trajectory to replace the embryo.time.bin #634

Open curiegat opened 2 years ago

curiegat commented 2 years ago

Dear Developer team,

Many thanks for building such a fantastic software, however I have some question regarding my pseudo time analysis.

Currently I am working with a Seurat object from a TNK data. There are three clusters in there, and I create the cds object using the script below:

`# Using Seurat clustering

Read thr RDS file

tnk = readRDS("~/TNK_01.rds")

parsing the gene names

gene_annotation = as.data.frame(rownames(tnk@reductions[["pca"]]@feature.loadings), row.names = rownames(tnk@reductions[["pca"]]@feature.loadings))
colnames(gene_annotation) = "gene_short_name"

parsing the cell information

cell_metadata = as.data.frame(tnk@assays[["RNA"]]@counts@Dimnames[[2]], row.names = tnk@assays[["RNA"]]@counts@Dimnames[[2]])
colnames(cell_metadata) = "barcode"

parsing the sparse matrix

new_matrix = tnk@assays[["RNA"]]@counts
new_matrix = new_matrix[rownames(tnk@reductions[["pca"]]@feature.loadings),]
expression_matrix = new_matrix

creating the cds object

cds_from_seurat = new_cell_data_set(expression_matrix, cell_metadata = cell_metadata, gene_metadata = gene_annotation)
cds_from_seurat = preprocess_cds(cds_from_seurat, num_dim = 10, method = 'PCA')
cds_from_seurat = reduce_dimension(cds_from_seurat, preprocess_method = 'PCA')
cds_from_seurat = cluster_cells(cds_from_seurat)

creating and assigning the made-up partition

recreate.partition = c(rep(1, length(cds_from_seurat@colData@rownames)))
names(recreate.partition) = cds_from_seurat@colData@rownames
recreate.partition = as.factor(recreate.partition)

assigning the cluster information

list_cluster = tnk@meta.data[["integrated_snn_res.0.1"]]
names(list_cluster) = tnk@assays[["RNA"]]@data@Dimnames[[2]]
cds_from_seurat@clusters@listData[["UMAP"]][["clusters"]] = list_cluster
cds_from_seurat@clusters@listData[["UMAP"]][["louvain_res"]] = "NA"

assigning feature loading for the downstream module analysis

cds_from_seurat = preprocess_cds(cds_from_seurat)
cds_from_seurat@principal_graph_aux@listData = list(tnk@reductions[["pca"]]@feature.loadings)

allow the package to create the graph

cds_from_seurat <- learn_graph(cds_from_seurat, use_partition = T)

ordering and plotting the cluster

cds_from_seurat  = order_cells(cds_from_seurat , reduction_method = "UMAP")
plot_cells(cds_from_seurat)

`

However, with this script we have to manually choose the roots. Meanwhile, we aim for the program to to specify the root of the trajectory programmatically. We wonder how do we do that and replace the embryo.time.bin (as specified in here [https://cole-trapnell-lab.github.io/monocle3/docs/trajectories/])with the information of sample Group (infected and naive) from Seurat object that we have.

We would like to share our RDS file from the Seurat object if it is necessary.

Looking forward for your help and suggestion.

Best regards,

Agatha

zandigohar commented 1 year ago

I modified the code to assign my initial nodes to cell_type1 from my cell_types info as below:

get_earliest_principal_node <- function(cds, whatever_you_name_it="cell_type1"){
  cell_ids <- which(colData(cds)[, "cell_types"] == whatever_you_name_it)
  closest_vertex <-  cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex
  closest_vertex <- as.matrix(closest_vertex[colnames(cds), ])
  root_pr_nodes <-  igraph::V(principal_graph(cds)[["UMAP"]])$name[as.numeric(names(which.max(table(closest_vertex[cell_ids,]))))]
  root_pr_nodes
}

Everything is the same as the tutorial script except for "cell_types" and "cell_type1". You may modify it to get what you want (e.g. "infected" and whatever colname you use from your metadata, in my case it was "cell_types").

curiegat commented 1 year ago

Hi

Thank you for your suggestion, I tried to use the code but it gives me this error

Screenshot 2023-02-07 at 16 47 33

However, I previously use this code and it is working

## this function is working; change the root bits from here - https://github.com/statOmics/tradeSeq/issues/38

get_earliest_principal_node_1 <- function(cds_try, time_bin="state"){
  cell_ids <- which(colData(cds_try)[,"embryo.time.bin"] == time_bin)

  closest_vertex <-
  cds_try@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex
  closest_vertex <- paste0("Y_", closest_vertex)
  root <- names(which(igraph::degree(principal_graph(cds_try)[["UMAP"]]) == 1))
  root <- root[root %in% closest_vertex]
  root
}

Maybe you have other opinion why this one is working while previously give the error.

Thank you and best regards,

Agatha