cole-trapnell-lab / monocle3

Other
337 stars 101 forks source link

Request: Explanation on programmatic assignment of root nodes per partition #405

Open ncalistri opened 4 years ago

ncalistri commented 4 years ago

In the Monocle 3 documentation under the "Order cells in pseudotime" section it says:

Note that we could easily do this on a per-partition basis by first grouping the cells by partition using the partitions() function. This would result in all cells being assigned a finite pseudotime.

It is not clear to me how to use the partitions() function to achieve this. It looks like this function simply fetches the data stored at cds@clusters$UMAP$partitions, and I don't see an obvious place to use this info in the provided root assignment function. You can of course subset the CDS and create one for each partition, but the language quoted above suggests there is a more elegant way to achieve the desired result.

Thanks!

ncalistri commented 4 years ago

An inefficient but working solution, still curious if there is a more efficient solution baked into the Monocle functions that I missed:

get_earliest_principal_node <- function(cds, time_bin ="130-170"){
  for(partition_id in unique(partitions(cds))){

    partition_cell_ids <- names(which(partitions(cds) == partition_id))
    temp_cds <- cds[ , partition_cell_ids]

    cell_ids <- which(colData(temp_cds)[, "embryo.time.bin"] == time_bin)

    closest_vertex <- temp_cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex
    closest_vertex <- as.matrix(closest_vertex[colnames(temp_cds), ])
    root_pr_node <- igraph::V(principal_graph(temp_cds)[["UMAP"]])$name[as.numeric(names
    (which.max(table(closest_vertex[cell_ids,]))))]

    if( partition_id == unique(partitions(cds))[1]){
        root_pr_nodes <- root_pr_node
      } else{
        root_pr_nodes <- c(root_pr_nodes, root_pr_node)
      }
    }
  root_pr_nodes
}
cds_3d <- order_cells(cds_3d, root_pr_nodes=get_earliest_principal_node(cds_3d))
chuyaowang commented 10 months ago

It's a little late, but I was confused by these words as well. The solution above can be problematic when the time_bin in the function call is not present in all partitions. In those partitions it would fail to find a root node. The code to get partition_cell_ids also does not work in the current version. Here is my solution:

library(dplyr)
library(gtools)
part_factors <- partitions(cds)
root_per_partition <- lapply(unique(part_factors),function(x){
    # find cell ids belonging to each partition
    partition_cell_ids <- names(part_factors)[which(part_factors==x)]
    # slice the partition
    temp_cds <- cds[,partition_cell_ids]
    # identify the earliest time in the partition
    earliest_time <- colData(temp_cds)[,'embryo.time.bin'] %>% unique %>% mixedsort %>%`[`(1)
    # find principal node using the function above
    temp <- get_earliest_principal_node(temp_cds,time_bin = earliest_time)
    return(temp)
}) %>% unlist

cds <- order_cells(cds, root_pr_nodes=root_per_partition)

The get_earliest_principal_node function is unchanged from the one in the original tutorial.