Open ncalistri opened 4 years ago
An inefficient but working solution, still curious if there is a more efficient solution baked into the Monocle functions that I missed:
get_earliest_principal_node <- function(cds, time_bin ="130-170"){
for(partition_id in unique(partitions(cds))){
partition_cell_ids <- names(which(partitions(cds) == partition_id))
temp_cds <- cds[ , partition_cell_ids]
cell_ids <- which(colData(temp_cds)[, "embryo.time.bin"] == time_bin)
closest_vertex <- temp_cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex
closest_vertex <- as.matrix(closest_vertex[colnames(temp_cds), ])
root_pr_node <- igraph::V(principal_graph(temp_cds)[["UMAP"]])$name[as.numeric(names
(which.max(table(closest_vertex[cell_ids,]))))]
if( partition_id == unique(partitions(cds))[1]){
root_pr_nodes <- root_pr_node
} else{
root_pr_nodes <- c(root_pr_nodes, root_pr_node)
}
}
root_pr_nodes
}
cds_3d <- order_cells(cds_3d, root_pr_nodes=get_earliest_principal_node(cds_3d))
It's a little late, but I was confused by these words as well. The solution above can be problematic when the time_bin
in the function call is not present in all partitions. In those partitions it would fail to find a root node. The code to get partition_cell_ids
also does not work in the current version. Here is my solution:
library(dplyr)
library(gtools)
part_factors <- partitions(cds)
root_per_partition <- lapply(unique(part_factors),function(x){
# find cell ids belonging to each partition
partition_cell_ids <- names(part_factors)[which(part_factors==x)]
# slice the partition
temp_cds <- cds[,partition_cell_ids]
# identify the earliest time in the partition
earliest_time <- colData(temp_cds)[,'embryo.time.bin'] %>% unique %>% mixedsort %>%`[`(1)
# find principal node using the function above
temp <- get_earliest_principal_node(temp_cds,time_bin = earliest_time)
return(temp)
}) %>% unlist
cds <- order_cells(cds, root_pr_nodes=root_per_partition)
The get_earliest_principal_node
function is unchanged from the one in the original tutorial.
In the Monocle 3 documentation under the "Order cells in pseudotime" section it says:
Note that we could easily do this on a per-partition basis by first grouping the cells by partition using the partitions() function. This would result in all cells being assigned a finite pseudotime.
It is not clear to me how to use the partitions() function to achieve this. It looks like this function simply fetches the data stored at cds@clusters$UMAP$partitions, and I don't see an obvious place to use this info in the provided root assignment function. You can of course subset the CDS and create one for each partition, but the language quoted above suggests there is a more elegant way to achieve the desired result.
Thanks!