Multiple root/end states

s849 commented 4 years ago

Hi,

Thank you for developing URD. I am wondering whether it is possible to assign multiple root/end states using the urdSubset function? I tried assigning them as (... c("end1", "end2", "end3")) and it accepted but, but then got an error downstream.

Currently, I have 6 time points but three of them are "end points" - all starting from the same root (although there may be more than one since this cluster of cells is heterogeneous).Is it possible to assign a particular subset from this cluster of cells as root?

Any help is much appreciated.

Thanks,

farrellja commented 4 years ago

Hi @s849. I'm not totally sure that I understand your question, but I will try to answer. You may have to clarify.

But, what I think you want to know: Your 'tip' populations don't have to come from one stage. So, if you have a developmental process that's happening over time, where you start to see differentiated cells appearing in the middle of your timecourse and continuing to accumulate over time, you could define those differentiated cells from each timepoint as the tips. (And each tip can actually incorporate cells from multiple timepoints). You would do this without actually subsetting the data, but just defining a clustering across all of the cells (create a column in @group.ids). You can also use cells from multiple timepoints as the root, if for instance, you have a persistent stem cell population that's present across multiple of your timepoints. If you have multiple distinct root cell types, however, you will have to subset and build multiple trees.

s849 commented 4 years ago

Hi,

Thanks for your previous comment/advice! I have been able to run URD with my data set. I followed both the tutorial and the long example provided as supplemental file from your original paper.

I, however, have a couple more questions that perhaps you can help out with:

If I have multiple time points (not done in duplicate), do you still recommend to try the batch correction step (I don't think so but would like to hear your opinion)?
For Random walks, can all of the clusters in the final pseudotime stage be used as tips? Or do you recommend narrowing down this list of clusters to "true tips" that can be biologically identifiable and meaningful?
In the #Define the tip cells example (Page 38 in the SupplementaryAnalysis.pdf file), what is tip.to.walk? I was not able to run this section because I am not sure what this is or how to generate it? Is there an example that can be used as reference?
Since I was not able to run #3 above, I did not run the simulateRandomWalk function but instead simulateRandomWalksFromTips as below:

axial.walks <- simulateRandomWalksFromTips(axial, tip.group.id="tip.clusters", root.cells=root.cells, transition.matrix = biased.tm, n.per.tip = 25000, root.visits = 1, max.steps = 5000, verbose = F)

Instead of processing random walks with processRandomWalks function as in this example, I processed mine as:

axial <- processRandomWalksFromTips(axial, axial.walks, verbose = F)

Is this reasonable? **I am running this locally - takes a long time but it gets done!

The tree I have built looks a bit strange (see below);

Is this a result of using all clusters from the final stage as tips?

I was able to identify some marker genes along each branch using only the setdiff function (I believe is the URD DE testing - did not try NMF). Do you recommend trying both?
Visualization of marker genes is giving me some trouble. Does the geneCascadeProcess function usually take a long time to run? I let it run and ran for a couple of days but did not get any results and ended up stopping it.

Sorry for the long thread. I am sure URD will be very useful in my project. Hoping you can provide some advice!

farrellja / URD

Multiple root/end states #55