andosa / treeinterpreter

BSD 3-Clause "New" or "Revised" License
745 stars 140 forks source link

improve the efficiency of tree leaf contribution calculation #9

Closed SauceCat closed 6 years ago

SauceCat commented 7 years ago

The original code calculates the tree-level contribution vector for each instance. Provided each instance would definitely fall down to one of the leaf nodes, this process can be more efficient for the large dataset with some small modification:

  1. Calculate the contribution vector for each unique tree leaf nodes and store the result into a dictionary, whose keys are leaf nodes and each key refers to the contribution vector of the leaf node.
  2. Assign the contribution vector to each instance regarding which leaf node it is assigned. (Avoid the calculation for each instance)