Closed tararae7 closed 3 years ago
It works as follows: when it chooses a column at random, it will choose each with a probability given by the weight of the column divided by the sum of the weights of the remaining splittable columns (i.e. having at least 2 different values, not having already split by it if using ndim > 1).
Columns are picked by a deterministic criterion when passing ndim=1 and using pooled/averaged gain criterion.
Thank you so much for responding David. I have questions regarding your answer here. If I sets weights such as this (1,5,1) then from your explanation the second feature would have the following probability of being randomly picked like 5/2=2.5*100=250%. Is that correct? Does that only apply to the first feature split in each tree? Please help me understand. If there is documentation explaining this please let me know and i can go there.
If you pass weights (1,5,1), then the probabilities are: (1/7, 5/7, 1/7).
Hi David, (mistype in the header...i meant column weights not sample weights)
I don't believe there is a full explanation on how the column_weights parameter gets applied in the isotree model. I understand that if i have 5 features i can pass a list to this parameter such as (5,2,3,4,7) in this case my fifth has the highest weight but what does that actually do in the model? Also, the help for this parameter says "Ignored when picking columns by deterministic criterion". How do you pick columns by the deterministic criteria? Is that the extended model? Thank you!