Closed estevezdo closed 1 year ago
Well, this is a very "interesting" dataset, because it also crashes MLPACK O_O
> X <- read.csv("https://github.com/gagolews/genieclust/files/10005060/data.csv")
X <- as.matrix(X[, -1])
mlpack::emst(X)$output
Segmentation fault (core dumped)
I will keep trying to find out what's wrong with it. Meanwhile, the following seems to work:
h <- genieclust::gclust(dist(X))
print(h)
(note to self: genieclust::mst.dist is correct)
set.seed(123)
X <- read.csv("https://github.com/gagolews/genieclust/files/10005060/data.csv")
X <- as.matrix(X[, -1])
stopifnot(abs(
genieclust::gclust(dist(X), gini_threshold=1.0)$height
-
fastcluster::hclust.vector(X, "single")$height
) < 1e-12)
## OK
Mystery solved!
X features missing values, and it should not.
> arrayInd(which(is.na(X)), dim(X))
[,1] [,2]
[1,] 58 2
[2,] 58 5
[3,] 59 5
[4,] 58 6
[5,] 59 7
[6,] 58 9
[7,] 58 10
[8,] 62 10
[9,] 58 12
[10,] 62 12
[11,] 71 12
[12,] 58 14
[13,] 62 14
[14,] 49 16
[15,] 58 16
[16,] 62 16
[17,] 58 17
[18,] 58 20
[19,] 59 20
I will patch the method so that it throws an error if there are missing values in data.
Thanks. This was very useful. BTW, might be helpful to feature integration with popular heatmap packages out there such as complex heatmaps. The only thing that needs to be explained is that most of these packages will do clustering in rows and columns so the column portion of the heatmap needs to be a transposed version of the matrix. For example in the complex heatmaps package the function for each of the options needs to be set as follows: cluster_rows = gclust(M) cluster_columns = gclust(t(M)) m - Some Matrix.
Example syntax: Heatmap(M, cluster_rows = gclust(M), cluster_columns = gclust(t(M)))
Nice use case, thanks!
Still, as a fan of minimalism, I'd rather refrain from introducing such a functionality separately - it can be easily obtained manually (at the cost of an additional call to the built-in transpose; as you've kindly shown above).
I am having issues with implementing gclust() in my data. I get the following error: Error in .mst.default(d, distance, M, cast_float32, verbose) : genieclust: Assertion std::isfinite(Dnn[bestj]) failed in ./c_mst.h:489
when using the verbose = TRUE argument this are the details I get: [genieclust] Computing the MST. [genieclust] Computing the MST... 99%Error in .mst.default(d, distance, M, cast_float32, verbose) : genieclust: Assertion std::isfinite(Dnn[bestj]) failed in ./c_mst.h:489
I am attaching the matrix I am using (with the first column used to name the rows on my matrix)
data.csv