Closed bgreenwell closed 3 years ago
Just realized this is noted in the README (apologies for not noticing that first) and the issue seems fixed when setting nthreads = 1
, nonetheless, it seems appropriate to leave this as an open issue? This is especially problematic since nthreads > 1
by default in most cases. Perhaps consider changing the default to nthreads = 1
.
It should be reproducible, but there was a bug in previous versions which would make it irreproducible, especially when ndim
is odd. Some questions:
coefs="uniform"
?ndim=1
or ndim=2
?Ahh, I was indeed using an older version. Seems to work fine in the latest CRAN release (v0.2.7)! Thanks for the quick response @david-cortes .
It should be reproducible, but there was a bug in previous versions which would make it irreproducible, especially when
ndim
is odd. Some questions:
- Are you using the latest version?
- Is it also irreproducible if you use
coefs="uniform"
?- is it also irreproducible if you use
ndim=1
orndim=2
?
Hi David,
I know this issue is closed, but I think there is no need to open a similar one. I have the same issue. I believe it should be reproducible. However, when ndim = 1
, it is not.
coefs="uniform"
. I used "normal" which is default.ndim=2
works fine.
My case has missing values in variables. In my test, when I change the missing_action = "impute"
, it is reproducible. But when missing_action = "divide"
, it is not. And as author said, nthreads = 1
solves this issue.@ThomasZhang717 Thanks for the information. I'm however unable to find non-reproducibility.
The following snippet always gives the same result for me:
library(isotree)
set.seed(1)
X <- matrix(rnorm(100 * 5), ncol=5)
rnd_ix <- matrix(c(sample(100, size=20, replace=TRUE),
sample(5, size=20, replace=TRUE)), ncol=2)
X[rnd_ix] <- NA
model <- isolation.forest(X, ndim=1, missing_action="divide",
random_seed=123, nthreads=3)
predict(model, X)
@ThomasZhang717 Thanks for the information. I'm however unable to find non-reproducibility.
The following snippet always gives the same result for me:
library(isotree) set.seed(1) X <- matrix(rnorm(100 * 5), ncol=5) rnd_ix <- matrix(c(sample(100, size=20, replace=TRUE), sample(5, size=20, replace=TRUE)), ncol=2) X[rnd_ix] <- NA model <- isolation.forest(X, ndim=1, missing_action="divide", random_seed=123, nthreads=3) predict(model, X)
- What kind of input data are you passing? (e.g. df, matrix, types of columns, etc.)
- Are you able to make a small example with random data?
Hi, David.
I have tried the same codes in your reply. Yes, it gives me the same results. However, if I add sample_size = 50
in isolation.froest
, the results will be slightly different as A and B. This is missed in my last reply, my bad.
A.
[1] 0.4268346 0.4931192 0.5216219 0.4354687 0.5143806 0.5548655 0.4078764 0.4211939 0.4374525
[10] 0.4108939 0.4422375 0.3918356 0.4751745 0.5103656 0.4215653 0.4399065 0.4014527 0.4310644
[19] 0.4747911 0.4199024 0.5115887 0.4848320 0.3998537 0.4907188 0.4206371 0.4873190 0.4159650
[28] 0.4799154 0.4195693 0.3867677 0.4945219 0.5163964 0.4270563 0.4516901 0.4399254 0.4307343
[37] 0.4139854 0.4142398 0.4335430 0.4079231 0.4539450 0.5146048 0.4380907 0.4149262 0.5220550
[46] 0.5205168 0.4969488 0.3905622 0.4176541 0.5524954 0.4501555 0.3969415 0.3859549 0.4474579
[55] 0.5252444 0.4851237 0.5147711 0.4598579 0.4973965 0.4574082 0.5514348 0.4821773 0.4498562
[64] 0.4313696 0.4457145 0.4796373 0.4570787 0.4943975 0.4146116 0.4703239 0.4833987 0.4250820
[73] 0.4212746 0.4949767 0.4767075 0.4213967 0.5084094 0.4753492 0.4625709 0.4879854 0.4181288
[82] 0.4450953 0.4618560 0.4974802 0.4780901 0.4630327 0.5233225 0.4190874 0.4849765 0.4408392
[91] 0.4337941 0.4856868 0.4475284 0.4498174 0.5686211 0.4389855 0.5160242 0.4328694 0.4403108
[100] 0.4185283
and
B. [1] 0.4265170 0.4931192 0.5217908 0.4353271 0.5138443 0.5547556 0.4077856 0.4211001 0.4374525 [10] 0.4107204 0.4417625 0.3916397 0.4751519 0.5110519 0.4213545 0.4401813 0.4013222 0.4309243 [19] 0.4747284 0.4197659 0.5112202 0.4844610 0.3996554 0.4907188 0.4205003 0.4876234 0.4156499 [28] 0.4797594 0.4197446 0.3865743 0.4942121 0.5163964 0.4268513 0.4511156 0.4399044 0.4305637 [37] 0.4138018 0.4141051 0.4333262 0.4078067 0.4538551 0.5145708 0.4380907 0.4147913 0.5220550 [46] 0.5205168 0.4967287 0.3904352 0.4171229 0.5523859 0.4498859 0.3966461 0.3858294 0.4471190 [55] 0.5246195 0.4848811 0.5147711 0.4595157 0.4973965 0.4573865 0.5513256 0.4818207 0.4501879 [64] 0.4312980 0.4458587 0.4791602 0.4575123 0.4937687 0.4144768 0.4703239 0.4833757 0.4247656 [73] 0.4215005 0.4951402 0.4767075 0.4212098 0.5080533 0.4748763 0.4625489 0.4879044 0.4175143 [82] 0.4448981 0.4621184 0.4975787 0.4780350 0.4628822 0.5233225 0.4189015 0.4849765 0.4410233 [91] 0.4334816 0.4855787 0.4471363 0.4497960 0.5685048 0.4391440 0.5157833 0.4326775 0.4405469 [100] 0.4182583
I run the entire codes 10 times. Mostly, it gives me A, sometimes, it shows B. A and B just have some small difference, like the first instance.
In my case, the format is almost same as your example, the difference is that I use data frame.
@ThomasZhang717 I'm still unable to find any irreproducibility.
This code runs without problems on my setup:
library(isotree)
set.seed(1)
X <- matrix(rnorm(100 * 5), ncol=5)
rnd_ix <- matrix(c(sample(100, size=20, replace=TRUE),
sample(5, size=20, replace=TRUE)), ncol=2)
X[rnd_ix] <- NA
set.seed(1)
for (i in 1:100) {
model <- isolation.forest(X, ndim=1, missing_action="divide",
sample_size=50, random_seed=123,
nthreads=sample(3, size=1)+1)
pred <- predict(model, X)
if (i > 1) {
diff = abs(pred - last_pred)
if (any(diff != 0))
stop("Different results")
}
last_pred <- pred
}
And same if I change it to as.data.frame(X)
or to a more uneven distribution like rgamma
.
Some more questions:
~/.R/Makevars
file or similar?Rcpp
after having installed isotree
?I was now able to reproduce the issue, but only on Windows. Will investigate.
I was now able to reproduce the issue, but only on Windows. Will investigate.
Yes, I was trying this on my old mac. There is no issue, works perfectly. However, when I move to windows, it happens.
For some more information:
makevars
file.Thanks, David. ;)
@ThomasZhang717 I've pushed a small update which should fix the problem. Could try the latest version from GitHub and see if you still experience this bug?
remotes::install_github("david-cortes/isotree")
@ThomasZhang717 I've pushed a small update which should fix the problem. Could try the latest version from GitHub and see if you still experience this bug?
remotes::install_github("david-cortes/isotree")
I tried to install the package. But it gives me an error. I tried remotes
and devtools
, both of them give me the same error when I install the package.
Error massage: "mult.hpp:959:13: warning: enumeration value 'Divide' not handled in switch [-Wswitch] make: *** [C:/PROGRA~1/R/R-41~1.0/etc/i386/Makeconf:245: Rwrapper.o] Error 1''
Warning message: "In i.p(...) : installation of package ‘C:/Users/Thomas/AppData/Local/Temp/RtmpgTgzmR/file319c4452737e/isotree_0.2.10.tar.gz’ had non-zero exit status"
Fixed again - could you give it another try now?
I tried the example codes and my codes. It seems that the issue is solved. Congrats. ;)
Hi @david-cortes, thanks for a great package. I'm writing a book on tree-based methods and am including a section on isolation forests using your package (which works really well). I've noticed, however, that the anomaly scores are not reproducible (at least for me) when specifying the seed via
set.seed()
or therandom_seed
argument. Reproducible example below:Is this a bug, or am I missing something?