bethatkinson / rpart

Recursive Partitioning and Regression Trees
46 stars 24 forks source link

How is the 'variable.importance' calculated in the rpart package? #43

Open statwangz opened 2 years ago

statwangz commented 2 years ago

In xgboost, there are several importance types, including weight’, ‘gain’, ‘cover’, ‘total_gain’, and ‘total_cover’. I wonder how rpart calculates importance score.

bethatkinson commented 2 years ago

In the vignette

An overall measure of variable importance is the sum of the goodness of split measures for each split for which it was the primary variable, plus goodness * (adjusted agreement) for all splits in which it was a surrogate. In the printout these are scaled to sum to 100 and the rounded values are shown, omitting any variable whose proportion is less than 1%. Imagine two variables which were essentially duplicates of each other; if we did not count surrogates they would split the importance with neither showing up as strongly as it should.


From: WANG Zhiwei @.> Sent: Tuesday, July 5, 2022 2:37 AM To: bethatkinson/rpart @.> Cc: Subscribed @.***> Subject: [EXTERNAL] [bethatkinson/rpart] How is the 'variable.importance' calculated in the rpart package? (Issue #43)

In xgboost, there are several importance types, including weight’, ‘gain’, ‘cover’, ‘total_gain’, and ‘total_cover’. I wonder how rpart calculates importance score.

— Reply to this email directly, view it on GitHubhttps://github.com/bethatkinson/rpart/issues/43, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACWQG53AD2YK5AMEHH2Y6VLVSPQ2ZANCNFSM52VG2DZA. You are receiving this because you are subscribed to this thread.Message ID: @.***>

statwangz commented 2 years ago

Thank you very much!