david-cortes / isotree

(Python, R, C/C++) Isolation Forest and variations such as SCiForest and EIF, with some additions (outlier detection + similarity + NA imputation)
https://isotree.readthedocs.io
BSD 2-Clause "Simplified" License
192 stars 38 forks source link

Typo: predict.isotree() documentation (type="avg_dpth" not accepted) #5

Closed sarahef2 closed 4 years ago

sarahef2 commented 4 years ago

In isotree/R/isoforest.R, the function predict.isolation_forest documentation says that it allows the following types:

@param type Type of prediction to output. Options are: \itemize{ \item "score" for the standardized outlier score, where values closer to 1 indicate more outlierness, while values closer to 0.5 indicate average outlierness, and close to 0 more averageness (harder to isolate). \item "avg_depth" for the non-standardized average isolation depth. \item "dist" for approximate pairwise distances (must pass more than 1 row) - these are standardized in the same way as outlierness, values closer to zero indicate nearer points, closer to one further away points, and closer to 0.5 average distance. \item "avg_sep" for the non-standardized average separation depth. \item "tree_num" for the terminal node number for each tree - if choosing this option, will return a list containing both the outlier score and the terminal node numbers, under entries score and tree_num, respectively. \item "impute" for imputation of missing values in newdata. }

However, avg_depth is not in the allowed types within the code, so trying to use the argument type="avg_depth" returns an error. >n <- 100 >m <- 2 >X <- matrix(rnorm(n * m), nrow = n) >X <- rbind(X, c(3, 3)) >iso <- isolation.forest(X, ntrees = 10, nthreads = 1) >dpths <- predict(iso, X, type="avg_depth") Error in check.str.option(type, "type", allowed_type) : 'type' must be one of "score", "avg_path", "dist", "avg_sep", "tree_num", "impute".

From line 758 of the same script: allowed_type <- c("score", "avg_path", "dist", "avg_sep", "tree_num", "impute")

It looks like avg_path is meant to be avg_depth, or the other way around.

david-cortes commented 4 years ago

Thanks for the bug report. This is now fixed in the master branch, with the parameter named avg_depth in both the R and Python versions.