cole-trapnell-lab / monocle-release

276 stars 116 forks source link

Cells arranged in spiky lines in trajectory plots #180

Open CodeInTheSkies opened 6 years ago

CodeInTheSkies commented 6 years ago

Hi Monocle Team,

Sometimes, I get cells arranged in spiky lines in trajectory plots, such as the attached screenshot.

I'm using 10X UMI counts, and follow the tutorial. I had gotten clusters in Seurat earlier in the pipeline, but then pass on the raw data to monocle.

I use fullModelFormulaStr = '~Seurat_ClusterID', in the function differentialGeneTest where Seurat_ClusterID contains the Seurat's cluster IDs imported through meta.data. Further, I use the top 1000 differential genes obtained.

Anybody knows what's the reason for these cells arranged in spiky lines? Any ways to avoid these, and make the trajectories more continuous?

As far as my guess goes, maybe this happens when there are thousands of cells. When many cells are allocated very similar pseudotime values, then the algorithm arranges the cells in lines so that they don't obscure each other.

Any other better explanations?

Many thanks for any responses.

p2

Xiaojieqiu commented 6 years ago

hi @CodeInTheSkies , first of all, please check out the recent release of Monocle 3 alpha and our new tutorial on trajectory inference: http://cole-trapnell-lab.github.io/monocle-release/monocle3/#tutorial-1-learning-trajectories-with-monocle-3 With our improve trajectory inference methods in Monocle 3 alpha, there won't be any spiked cells.

The following are some explanation on what does the spike cells pattern mean and how you can alleviate it if you insist on using Monocle 2. As you guessed, the spike cells happen when you have a lot of cells but using only a small number of centroids to represent the developmental trajectory with the principal graph. The spiked cells mean they correspond to the same principal graph node. So you can alleviate this by increasing the number of centroids, for example by passing ncenter = 700 to the reduceDimension call.

CodeInTheSkies commented 6 years ago

Thank you very much! I'll try the solution, and also the latest Monocle.

On Thu, Aug 9, 2018, 4:20 PM Xiaojie Qiu, notifications@github.com wrote:

hi @CodeInTheSkies https://github.com/CodeInTheSkies , first of all, please check out the recent release of Monocle 3 alpha and our new tutorial on trajectory inference: http://cole-trapnell-lab.github.io/monocle-release/monocle3/#tutorial-1-learning-trajectories-with-monocle-3 With our improve trajectory inference methods in Monocle 3 alpha, there won't be any spiked cells.

The following are some explanation on what does the spike cells pattern mean and how you can alleviate it if you insist on using Monocle 2. As you guessed, the spike cells happen when you have a lot of cells but using only a small number of centroids to represent the developmental trajectory with the principal graph. The spiked cells mean they correspond to the same principal graph node. So you can alleviate this by increasing the number of centroids, for example by passing ncenter = 700 to the reduceDimension call.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cole-trapnell-lab/monocle-release/issues/180#issuecomment-411884324, or mute the thread https://github.com/notifications/unsubscribe-auth/AL_ruDNibz8f3P3ySXJuQsJcTTK__azwks5uPJmWgaJpZM4V0sR7 .

CodeInTheSkies commented 6 years ago

Hi @Xiaojieqiu,

As per your advice, I tried setting ncenter to a high value of 700, and I did get rid of the spiked cells problem! But then I do get many small branches. But what I want to ensure is by setting ncenter to a such a large value, am I forcing the algorithm to find too many branches? Or, is it that the high value is only going to serve as some kind of upper limit, and so the algorithm would not strain itself to find too many branches to satisfy the large ncenter, but rather would optimally pick as many branches as it deems appropriate for the data?

Thank you very much.

CodeInTheSkies commented 6 years ago

Further, @Xiaojieqiu, I just want to add that, basically, I'm wondering how to choose that large ncenter value. Can we just set it to a large value like 700, presumably arbitrarily chosen, and then trust the algorithm to optimally determine the number of branches? Or, should we employ any systematic strategy to choose that large value of ncenter?

Thanks lot for the insights!

Xiaojieqiu commented 6 years ago

Hi @CodeInTheSkies you can think the number of ncenter as the resolution parameter (similar to the same parameter, for example, in Louvain clustering). Having a large ncenter means the algorithm will try to find a structure to represent the data manifold with more details. The algorithm itself doesn't explicit constrain the number of branches but it does use an optimization function to identify as many branch as it deems appropriate for the data. We used a function (cal_ncenter) to automatically pick a reasonable number of ncenter based on the size of the data. I hope this helps, the following are some reponse to your questions.

But what I want to ensure is by setting ncenter to a such a large value, am I forcing the algorithm to find too many branches? Not really. More centers often lead to more branches, but the number of branches is not only dependent only on the number of centers.

Or, is it that the high value is only going to serve as some kind of upper limit, and so the algorithm would not strain itself to find too many branches to satisfy the large ncenter, but rather would optimally pick as many branches as it deems appropriate for the data?

Yes. this is the more safe way to say it

Can we just set it to a large value like 700, presumably arbitrarily chosen, and then trust the algorithm to optimally determine the number of branches? Or, should we employ any systematic strategy to choose that large value of ncenter?

I agree with you on the first part. the algorithm returns the optimal (although not strictly global optimal) graph structure based on the ncenter, other parameters in the algorithm and the data itself. If the data is very clean, tuning parameters is not necessary but it becomes more like an art if your data has more intricate structure.

I suggest you to test your data with Monocle 3 which solves a few fundamental limitations of Monocle 2 algorithm.

CodeInTheSkies commented 6 years ago

Thank you very much!! Those explanations clear things up very nicely. I think I will try a few values for ncenter, and then see if the detailed structure makes biological sense. Many thanks, again, for your efforts answering my questions.

CodeInTheSkies commented 6 years ago

By the way, just another quick question: How to extract the optimal auto-determined ncenter value that the method calculates? I first did it this way, and then now I'm trying different values of ncenter as explicit input. But I would like to know the value that the algorithm automatically determined for my data.

Thank you!

CodeInTheSkies commented 5 years ago

Hi @Xiaojieqiu,

Just wondering, do you have an answer for my last question above? That is, how do I get the nCenter that is automatically calculated? This information will be very helpful, and after that we can close this issue.

Many thanks in advance!