Closed bnprks closed 3 weeks ago
A beautiful graph and test!
Changes all are agreeable. Do you have any opinions on Seurat, with how they are implementing leiden clustering? They default on using RBConfigurationVertexPartition
rather than modularity. This isn't directly available through igraph
, but it is through the leiden
python package. Also, I wonder why CPM is the default for igraph.
From the docs, it looks like RBConfigurationVertexPartition is basically the same objective function as modularity just with some constant scaling, so I believe this is consistent with the approach of Seurat
Currently,
cluster_graph_leiden()
by default will output a number of clusters that scales approximately linearly with number of cells if theresolution
parameter is held constant. This is generally not good and leads to problems like this where people get thousands of clusters called on large datasets.This pull request does the following:
objective_function
fromCPM
tomodularity
sets defaultresolution
back to 1.Here is the benchmarking data to justify this change. Note that Leiden modularity with resolution = 1 gives consistent cluster sizes just like Louvain, but Leiden CPM will give out a ton of clusters for large datasets unless the resolution parameter is adjusted down for large datasets.
cluster-resolution.csv
Click for plotting code
```r data |> mutate(alg=case_match(alg, "leiden" ~ "Leiden CPM", "leiden-modularity" ~ "Leiden Modularity", "louvain" ~ "Louvain"), resolution=factor(as.numeric(resolution), sort(unique(as.numeric(resolution))))) |> ggplot(aes(cells, clusts, color=resolution)) + geom_line() + geom_point() + scale_x_continuous(transform="log10", guide=guide_axis_logticks(), labels=scales::label_log(), breaks=c(1e5, 1e6)) + scale_y_continuous(transform="log10", guide=guide_axis_logticks()) + scale_color_manual(values=RColorBrewer::brewer.pal(9, "BuPu")[3:9]) + facet_wrap("alg") + theme_bw() + coord_fixed() + labs(title="Cluster counts by resolution", y="Cluster count", x="Dataset size (cells)") ```