Closed antagomir closed 1 year ago
Seems reasonable, though the distance function would be a parameter of HclustParam
, not clusterRows
, given that not all clustering methods would easily support custom distance calculations (e.g., k-means wouldn't care).
Happy to take a PR, if you can demonstrate a MVP with HclustParam
.
Great. We will have a look and see how it goes.
Done. By @BananaCancer
The bluster package is currently relying on
stats::dist
for distance calculations in the clustering process.Limitation in this is that the
stats::dist
function covers only a relatively small set of dissimilarity indices. For instance, it is missing many dissimilarity indices that are commonly used in ecological analyses and available for instance throughvegan::vegdist
. Extending the availability of dissimilarity indices would be beneficial for making thebluster
package support other applications of SummarizedExperiment family, for instance in microbiome research that we are working on. Providing access to readily available dissimilarity indices would support users.Suggested solution:
vegan::vegdist
in the bluster packageThis would concern multiple functions.
The process would then look, for instance in the context of
clusterRows
and hierarchical clustering, something like:clusterRows(sce, distfun=stats::dist, HclustParam(metric="euclidean"))
clusterRows(sce, distfun=vegan::vegdist, HclustParam(metric="bray"))
etc.