david-cortes / outliertree

(Python, R, C++) Explainable outlier/anomaly detection through decision tree conditioning
http://outliertree.readthedocs.io
GNU General Public License v3.0
56 stars 4 forks source link

outlier.tree Hangs Indefinitely with Certain Numeric Columns Present in Dataset #9

Closed statadvice closed 1 week ago

statadvice commented 2 weeks ago

Hello! The outlier.tree process does not complete when running on a dataset with only numeric variables, but it finishes quickly if I remove columns 4 and 6. I haven't identified any specific characteristics in these columns that could be causing the slowdown (or even non-termination).

Keeps executing for hours and even days without terminating:

library(outliertree)
library(rio)
selected_data<-import("https://data.statadvice.com/selected_data.xlsx")
outliers_model = outliertree::outlier.tree(selected_data)

Works fast:

library(outliertree)
library(rio)
selected_data<-import("https://data.statadvice.com/selected_data.xlsx")[,-c(4,6)]
outliers_model = outliertree::outlier.tree(selected_data)
david-cortes commented 2 weeks ago

Thanks for the bug report. I unfortunately cannot access the domain data.statadvice.com and thus cannot download the file.

Would you be able to attach it here, or to upload it to some other service like dropbox?

Also a few questions:

statadvice commented 1 week ago

@david-cortes Here is the data file: selected_data.xlsx OS: Windows 10, 64bit CPU: x64 (64-bit) pre-built version from cran the data is 3200 obs of 6 numeric variables found in the 174KB Excel file found above

david-cortes commented 1 week ago

Thanks. I was able to download the data and can reproduce the issue - will investigate.

david-cortes commented 1 week ago

Thanks, should be solved now. It will take a while for the update to reach CRAN, but in the meantime, the updated version can be installed from github:

remotes::install_github("david-cortes/outliertree")