carpentries-incubator / r-ml-tabular-data

A Data-Carpentry-style lesson on some ML techniques in R
https://carpentries-incubator.github.io/r-ml-tabular-data/
Other
3 stars 0 forks source link

Consider discussion unsupervised mode in ep 4 #34

Open djhunter opened 2 years ago

djhunter commented 2 years ago

randomForest can run in unsupervised mode: instead of sending it a formula and a training set, you can just send it the whole dataframe and tell it to return a proximity matrix:

runsup <- randomForest(redwine, proximity = TRUE)

The matrix runsup$proximity is a pseudo-distance matrix that scores how often two observations (rows) ended up in the same terminal node of a tree. So in this case, wines that are similar should have high proximity scores.

head(sort(runsup$proximity[17,], decreasing = TRUE))

Output:

       17      1157       373        69       527       922 
1.0000000 0.1707317 0.1304348 0.1267606 0.1111111 0.1076923 

Interpretation: If you liked wine 17, you are also likely to like wines 1157, 373, 69, 527, and 922.