Open tbrunetti opened 2 years ago
Hi @tbrunetti ,
Based on the settings you are using and recent changes, I think the issue has to do with the subclustering splitting your cells too much (maybe down to each cell being a subcluster). Once subclusters are defined, an hclust is calculated for each subcluster independently (this is where hclust_method
matters), and combined with the others. So if each cell is a subcluster, they will just get appended one after the other to the merged tree based on the iteration order.
How many cells do you have, and what does the preliminary plot look like? You can check the infercnv_obj@tumor_subclusters$subclusters
slot for the list of subclusters and their content.
If that is indeed the issue, I would try lowering the leiden_resolution parameter.
Too much fragmentation in subclusters is also bad for the HMM, so improving that would benefit predictions too.
Regards, Christophe.
Hi @tbrunetti ,
I have the same problem, and I have no idea how to fix it. I tried to tune leiden_resolution
and tumor_subcluster_pval
but the result is absolutely the same. Examining infercnv_obj@tumor_subclusters$subclusters
showed, that algorithm uses each cell as a separate cluster. @GeorgescuC do you know something about this problem?
Thank you, Gleb
Hi @tbrunetti ,
Based on the settings you are using and recent changes, I think the issue has to do with the subclustering splitting your cells too much (maybe down to each cell being a subcluster). Once subclusters are defined, an hclust is calculated for each subcluster independently (this is where
hclust_method
matters), and combined with the others. So if each cell is a subcluster, they will just get appended one after the other to the merged tree based on the iteration order.How many cells do you have, and what does the preliminary plot look like? You can check the
infercnv_obj@tumor_subclusters$subclusters
slot for the list of subclusters and their content. If that is indeed the issue, I would try lowering the leiden_resolution parameter. Too much fragmentation in subclusters is also bad for the HMM, so improving that would benefit predictions too.Regards, Christophe.
The preliminary data looks the same as the final data and it occurs for any type of input I use. I tried giving it as much as 8000 cells, and othertimes a few as 1000 cells, each representing different samples and not one time have I had it cluster the way infercnv used to do. I ended up using an old version of infercnv just to get the hclust to work because it does work on a previous version, just not on the current master branch version.
@tlebchan Yeah, I think we both have the same problem. I tried messing around with the leiden_resolution
too without any changes or effect on the hclust
Hi @tbrunetti @tlebchan ,
Based on @tlebchan examining the infercnv_obj@tumor_subclusters$subclusters
data, the issue does appear to be related to over-splitting of subclusters down to 1 cell per cluster.
In the recent version, we have changed how the Leiden algorithm is called to use the R implementation in "igraph" because the "leidenalg" implementation that called on Python started to produce errors and was not supported anymore. With this, the default scoring metric used in the algorithm changed from "modularity" to "CPM" (which is theoretically an improvement). One of the differences between those is that the value of the "resolution" treshold required to obtain a certain level of splitting changed. A resolution of 1 with "CPM" can roughly be replaced by 0.05-0.1 when using "modularity", but the new default setting of 0.05 might not work well for your datasets.
Besides that, we also run a PCA to define the shared nearest neighbor graph to run the Leiden algorithm on by default now. If you wish to use the older method without PCA, you can set leiden_method="simple"
.
One of the results of these changes is a tendency to generate a number of very small clusters, which usually contain the noisiest cells.
tumor_subcluster_pval
only affects subclustering when using the random trees method.
If updating to make sure you have the latest version of the code does not work, could you share a small example dataset and the options you used to produce the issue so I can debug it?
Regards, Christophe.
Hello,
I have been using the newest pull from the master branch and everything works well, except now my cells will not cluster together along the y-axis. I have tried setting
group_by_cluster = T
andgroup_by_cluster = F
and both return the same result. I only have a single sample. An older version of your software, properly clusters this using the same command:Any suggestions of what to try? Thanks!
UPDATE I have been playing around with changing the
hclust_method=
parameter, and I think that may be broken in the newest/current master branch? So far, I have tried setting it to: ward.D, ward.D2, single, and complete, and none of them change the clustering. Additionally, the default of ward.D2 in an older version of this software, does properly cluster this data set as I expect.