farrellja / URD

URD - Reconstruction of Branching Developmental Trajectories
GNU General Public License v3.0
117 stars 41 forks source link

cells were not visited by a branch that exists at their pseudotime and were not assigned. #3

Closed joshua-gould closed 5 years ago

joshua-gould commented 6 years ago

I'm getting the following warning in buildTree-do you know what the cause could be? Thanks.

In assignCellsToSegments(object, pseudotime, verbose) : 19348 cells were not visited by a branch that exists at their pseudotime and were not assigned.

farrellja commented 6 years ago

Hi Joshua,

There are several reasons this could have happened. First, it's normal for some fraction of cells to not be assigned, but 19k seems like a lot (unless your data set is truly enormous).

The warning can result from the following situations: first, the branching dendrogram is determined in the data. Then, for each cell, all of the branches that exist in the dendrogram at the cell's pseudotime are compared, and the cell is assigned to a branch where it was most highly visited. Cells may not be assigned for a variety of reasons: first, perhaps the cell was not visited highly by random walks from any tip; or, second, the cell was visited by the walks from a tip, but that segment has already fused with other tips/segments at that cell's pseudotime, and the cell was not well visited by the combined segment (which is calculated as the weighted mean of visitation from all downstream/child segments). (Imagine: If you had a cell visited highly from tip 4, but not from tips 5, 6, or 7, but those four segments have already fused together into a common branch at that cell's pseudotime, then the cell might not get assigned because its averaged visitation from tips 4-7 would not be very high.)

(1) One possibility is that you didn't perform enough random walk simulations, which has left many cells unvisited. You can use pseudotimePlotStabilityOverall after loading a particular set of random walks to see whether the calculated pseudotime has reached an asymptote, which would indicate that enough simulations were performed.

(2) Another possibility is that there are 1 or more 'tips' in the data that you didn't perform any walks from. If this were the case, you would expect a large, contiguous region of the data to be unvisited. In that case, you might just need to define more tips. They may not be in the final stage of your data set if there are populations that disappear. You can check which cells in your data were not assigned to the tree by using plotDim(object, label="segment", na.rm=F, ...) which will show all of the cells that were not assigned to the tree as a grey color (their segment will be NA). (This will plot on the tSNE by default, but you can use dim.red="pca" or dim.red="dm" to plot on pairs of PCs or diffusion components.) You can plot visitation from particular tips with plotDim(object, label="visitfreq.log.[tip]", ...) (i.e. "visitfreq.log.3" to plot log10 of cell visitation from tip 3). That will help you check whether cells were not visited by any of the walks, or whether they were visited but remain unassigned.

(3) A third possibility could be that if the bias in your random walks is extreme, that cells on the fringes of your data are not visited. This would be apparent on plots of the diffusion components, where visitation was limited to the center of spikes and doesn't go to the outsides.

If cells were visited, but were unassigned, then there are a couple of possible reasons I can think of off the top of my head:

(4) It could represent a poor pseudotime calculation that causes cells of similar expression to have dramatically different pseudotimes (in which case, perhaps the sigma of your diffusion map should be adjusted).

(5) It could represent poor parameters for the buildTree function, where p.thresh is too small and segments are fusing too aggressively.

You might also try lowering visit.threshold in buildTree which would allow cells to be assigned to a segment with a lower visitation.

Hope that helps / gives you some additional tools or ideas for investigating what URD is doing with your data.