cole-trapnell-lab / monocle3

Other
337 stars 101 forks source link

Input for Ordering Cells Along Pseudotime #240

Closed stephenchea closed 4 years ago

stephenchea commented 5 years ago

I have single cells from three different real-time timepoints. Instead of inputting a genes for ordering all of these cells along pseudotime, is there a way to specify one time-point as the beginning of pseudotime and another timepoint as the end, and then order all the cells along pseudotime?

BrianLohman commented 4 years ago

I would like to know the answer to this as well. It's not clear from the documentation if this is possible.

RoganGrant commented 4 years ago

At this point it would seem that pseudotime might not be necessary (and "real" time would suffice). Forcing an order would effectively be doing the work of the algorithm, and if the trajectory isn't being correctly predicted, it may not be possible to draw one with your dataset.

If you are still worried about time-points not being perfectly synchronized, you can select the the cluster(s) most enriched for your earliest time points / progenitor markers as your root nodes, and then adjust as necessary until the trajectory makes biological sense. In my (limited) experience, trying different combinations of root nodes, normalization parameters (regressing out %mito did it for me), and quality cutoffs for cells/genes can make a significant difference. You can also adjust UMAP parameters. This is just my experience from experimenting with a couple of published datasets, but your mileage may vary.

saisomesh2594 commented 4 years ago

@stephenchea and @BrianLohman to directyl answer your question, the documentation does provide a function to select cells from the earliest 'real' time-point as root nodes and order cells in pseudotime. I am copying-paste the function here

# a helper function to identify the root principal points: get_earliest_principal_node <- function(cds, time_bin="130-170"){ cell_ids <- which(colData(cds)[, "embryo.time.bin"] == time_bin)

closest_vertex <- cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex closest_vertex <- as.matrix(closest_vertex[colnames(cds), ]) root_pr_nodes <- igraph::V(principal_graph(cds)[["UMAP"]])$name[as.numeric(names(which.max(table(closest_vertex[cell_ids,]))))]
root_pr_nodes }

You can change embryo.time.bin to the column name indicating the time-point and time_bin to your earliest time point and the function will give you roots for the partition. Does this solve ? Maybe I missed something ?

Somesh

w2niva commented 4 years ago

I need help with this issue as well. What Somesh suggested probably works well if there is some information about the initial cluster identity. I have been able to use ordercells to set root successfully and perform differential gene expression on the trajectory. But I am unable to narrow down my cell data set aka "embryo.time" in the Monocle tutorial to use some genes to plot_cell_trajectory. Is there a way to subset data if prior clustering information or cell type information is unavailable?

hpliner commented 4 years ago

@w2niva I'm not sure I understand your question - are you trying to decide where to put your root node? If so, you will need to use some previous biology as the algorithm can't distinguish beginning from end. I recommend using gene expression to get an idea of where the beginning might be.

Seems like @saisomesh2594 solved the remainder of the questions in this thread (thanks!). If not, reopen