IMB-Computational-Genomics-Lab / ascend

R package - Analysis of Single Cell Expression, Normalisation and Differential expression (ascend)
21 stars 7 forks source link

cluster 0 #8

Closed jonxujun closed 6 years ago

jonxujun commented 6 years ago

just wondering in what case will the dynamic cut program generate cluster 0, which means neither of the available clusters has been assigned to an individual cell?

Thanks!

asenabouth commented 6 years ago

It's possible; haven't come across a dataset that produces this result though. You can have a look at what the dynamic tree cut initially produces by examining the rand matrix. The initial tree cut will be under "REF".

Here is the call that produces these results:

original.clusters <- unname(dynamicTreeCut::cutreeDynamic(original.tree,
                        distM=as.matrix(distance.matrix), verbose=0

Let me know if you do come across a dataset that produces these results. It would be interesting to see how it would affect the CORE algorithm.

jonxujun commented 6 years ago

Yeh, I met such situation in my data. And by debugging into this function (dynamicTreeCut), I can see some cells were not given any cluster. But I cannot think of a scenario where the trees are cut (no matter which threshold is used), but some leaves in some branches are not touched - I mean at least its upper branches should be cut and thus clustered...

asenabouth commented 6 years ago

It could be that there are not enough members to form a cluster. I suggest you try playing around with the following arguments with the cutreeDynamic function: minClusterSize and possibly cutHeight. The minimum number of cells required to form a cluster is 20. I could add this as an argument for the RunCORE function to account this.

jonxujun commented 6 years ago

that would be helpful! my current logic is to remove the cluster 0 cells using SubsetCluster, and then do another round clustering, and remove again, loop until no more cluster 0 exists. Does this logic sound reasonable for you, please?

asenabouth commented 6 years ago

I think that's a good work around for it in the meantime. I could also add in a step where these non-clusters are removed or labelled as "unclustered" as well. This would probably resolve the conflicts with the RunCORE algorithm.

Regarding minimal cluster size, I guess what you need to ask is how many cells do you need before you could truly call it a cluster. I could see it being an issue for smaller datasets (in the hundreds).

I'll mark this as an issue to work on in the new year. Thank you for flagging this @jonxujun.

jonxujun commented 6 years ago

Thanks a lot!

asenabouth commented 6 years ago

I encountered similar results in a dataset I was analysing for a collaborator. Turns out some datasets will require multiple rounds of clustering checks to ensure all outlier cells are identified and removed. I've incorporated this into the next update for ascend.