fsavje / scclust-R

Size-constrained Clustering in R
GNU General Public License v3.0
31 stars 3 forks source link

hierarchical_clustering() function crashes my R session #14

Open JJ585 opened 2 hours ago

JJ585 commented 2 hours ago

I have a problem with running _hierarchicalclustering() function both on R Windows and Linux RStudio server, in both cases this function crashes my R session when I use _existingclustering parameter.

Screenshot 2024-10-15 141349

Screenshot 2024-10-15 141824

Screenshot 2024-10-15 143216

I tried to debug this function, but the problem is at the C level under .Call(Rscc_hierarchical_clustering), probably some segmentation fault.

Reproducible example here.

My setup: Red Hat Enterprise Linux release 9.4 (Plow) or Windows 10 R 4.4.1 or RStudio server 2024.09.0 Build 375 with R 4.4.1 scclust version 0.2.5

fsavje commented 2 hours ago

Hi JJ585,

Thanks for the bug report.

I just ran your reproducible example, but I don't get an error. See printout below. Also using R 4.4.1 with scclust 0.2.5.

Could you provide more details on how the bug arises?

R version 4.4.1 (2024-06-14) -- "Race for Your Life"
Copyright (C) 2024 The R Foundation for Statistical Computing
Platform: aarch64-apple-darwin20

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(scclust)
Loading required package: distances
> load("./data.rda")
> table(clust1)
clust1
  0   1   2 
106  90  66 
> clust2 <- hierarchical_clustering(distances = dat.dist,
+                                   size_constraint = 50,
+                                   existing_clustering = clust1)
> table(clust2)
clust2
 0  1  2  3 
66 90 50 56 
JJ585 commented 1 hour ago

The error occurs when hierarchical_clustering function is called (Shit+Enter) - it starts execution for a short period of time and then the window appears that states that session has been terminated. I tried to step into the function and call it line by line and the error occurs at

clustering <- .Call(Rscc_hierarchical_clustering, distances, size_constraint, batch_assign, existing_clustering)

fsavje commented 1 hour ago

Ok. I can still not reproduce the bug, which makes it difficult to bug hunt.

I'm using MacOS, so this might be something that doesn't pop up on that platform. I will try to reproduce on other platforms, but it will take a bit longer to do. Stay tuned.

Thanks again for the careful bug report. I very much appreciate it.

JJ585 commented 1 hour ago

Ok. I can still not reproduce the bug, which makes it difficult to bug hunt.

I'm using MacOS, so this might be something that doesn't pop up on that platform. I will try to reproduce on other platforms, but it will take a bit longer to do. Stay tuned.

Thanks again for the careful bug report. I very much appreciate it.

I added screenshot mp4 to show how the error happens.

The error occurs at external .Call() image

fsavje commented 1 hour ago

Thanks!

To clarify, I wasn't doubting whether you had the bug. But I need to reproduce it in order to start investigating where it comes from.