carpentries-incubator / machine-learning-trees-python

Introduction to tree models with Python
https://carpentries-incubator.github.io/machine-learning-trees-python
Other
3 stars 6 forks source link

ed-dash comments #26

Closed alanocallaghan closed 5 months ago

alanocallaghan commented 5 months ago

base_estimator -> estimator in recent sklearn

In the random forest page, we specify max_features=1 but the decision boundaries are all bivariate. This makes for a very confusing introduction to random forests https://carpentries-incubator.github.io/machine-learning-trees-python/06-random-forest/index.html

tompollard commented 5 months ago

I've found this lesson to work well with just two features, but I do play around with some of the parameters to demonstrate what is happening. These should be captured in the materials, so I'll try to make some updates to explain things more clearly.

alanocallaghan commented 5 months ago

What I mean is that if we're fitting a random forest to two variables, then I'd expect the feature subsampling to produce trees with one feature, otherwise it's just a regular tree ensemble

tompollard commented 5 months ago

What I mean is that if we're fitting a random forest to two variables, then I'd expect the feature subsampling to produce trees with one feature, otherwise it's just a regular tree ensemble

One of the nice things about dealing with only two variables is that we can demonstrate that this expectation is not true for random forests (at least for this particular implementation).

If it was true that setting max_features=1 as an argument led to trees with a single variable, we would not see the following trees (which all make decisions based on both variables).

image

The explanation is that features are being limited at each split, not at the model level:

Screenshot 2024-03-27 at 11 11 34 AM

alanocallaghan commented 5 months ago

Ah. In that case it'd be good to explain that in the lesson

tompollard commented 5 months ago

@alanocallaghan Please could you take a look at https://github.com/carpentries-incubator/machine-learning-trees-python/pull/27 and let me know if this resolves the issue?