david-cortes / outliertree

(Python, R, C++) Explainable outlier/anomaly detection through decision tree conditioning
http://outliertree.readthedocs.io
GNU General Public License v3.0
56 stars 4 forks source link

Crashing RStudio #6

Open systemnova opened 3 years ago

systemnova commented 3 years ago

When running the following, Rstudio Crashes: p_load(outliertree, lubridate) df_temp <- df %>% select(!where(is.Date)) #%>% outlier.tree(nthreads=1) df_temp <-sample_n(df_temp,10000) otree <- outlier.tree(df_temp, max_depth = 0)

If i random sample down to 2000, it works. So it seems like a validation issue or bad data issue somewhere. No other function has crashed when running on the whole dataset, so it appears to be an issue with outliertree.

Unfortunately I'm unable to share the dataset, but there are known column shift issues in the data set from commas in the CSV being in unexpected places and strings appearing in mostly numeric columns. Appologies that i cant be more specific, hopefully of some benefit. Because Rstudio hard crashes I'm unable to produce any other output (eg. warnings())

david-cortes commented 3 years ago

Thanks for the bug report. A couple questions:

systemnova commented 3 years ago

Does it crash while building the model, or after it has already been built and is being used? If the latter, it should be solved in version 1.7.2 (currently in CRAN).

It crashes in the first few seconds of running outlier.tree I'm using version 1.7.1 and unfortunately cant use compile binaries or use remotes to install the newer version.

What types of columns does the data contain? How many?

40 columns, 33 character, 7 integer i've just inspected a little more closely and noticed some POSIXct variables still left after the date filter. I've added a POSIXct selection now, but it still crashes when running on the whole dataset. I'm wondering if it's a POSIXct/Date value in a column where it shouldnt be thats causing the issue.

Does it crash if you set outliers_print=0?

Yes, still crashes

Do you see some error message? E.g. something looking like this: std::bad_alloc()

No, it's a hard crash with a popup dialogue saying "R Session Aborted - R encountered a fatal error. The session was terminated" no output additional output is visible in the console.

david-cortes commented 3 years ago

Couple more questions: