randomForest can run in unsupervised mode: instead of sending it a formula and a training set, you can just send it the whole dataframe and tell it to return a proximity matrix:
runsup <- randomForest(redwine, proximity = TRUE)
The matrix runsup$proximity is a pseudo-distance matrix that scores how often two observations (rows) ended up in the same terminal node of a tree. So in this case, wines that are similar should have high proximity scores.
randomForest
can run in unsupervised mode: instead of sending it a formula and a training set, you can just send it the whole dataframe and tell it to return a proximity matrix:The matrix
runsup$proximity
is a pseudo-distance matrix that scores how often two observations (rows) ended up in the same terminal node of a tree. So in this case, wines that are similar should have high proximity scores.Output:
Interpretation: If you liked wine 17, you are also likely to like wines 1157, 373, 69, 527, and 922.