I've been running into an issue with the partial function when using it across multiple features with a custom grid table. I've noticed that the partial function works well with a single row in the train dataset but doesn't seem to give sensible results when you add more rows and vary multiple features. Here is an example:
The above shows that when using the partial function for more than one row, the result deviates from the average prediction of each row. I've had a look into what is causing this and at first I thought it was due to the differences in xgb.DMatrix verse data.matrix but I ended up replicating the error by pulling apart the pardep function a bit. I noticed that the following line of code in the pardep function seems to be causing the issue when pred.var is more than one feature:
temp[, pred.var] <- pred.grid[i, pred.var]
It seems to be that the assignment of the pred.grid to a data.matrix which is then subset to a single row somehow transforms this into a single array which is used to populate temp by column rather than row. Using the example above, my dataset is meant to look like
Mass Age
Row 1: 20 30
Row 2: 20 30
when running the first grid point (i.e i =1 in the foreach loop).
But what I'm seeing when stepping through is:
Mass Age
Row 1: 20 20
Row 2: 30 30
Is this a known issue or am I doing something wrong when feeding in the parameters to partial?
Hello,
I've been running into an issue with the partial function when using it across multiple features with a custom grid table. I've noticed that the
partial
function works well with a single row in the train dataset but doesn't seem to give sensible results when you add more rows and vary multiple features. Here is an example:The above shows that when using the
partial
function for more than one row, the result deviates from the average prediction of each row. I've had a look into what is causing this and at first I thought it was due to the differences inxgb.DMatrix
versedata.matrix
but I ended up replicating the error by pulling apart thepardep
function a bit. I noticed that the following line of code in thepardep
function seems to be causing the issue whenpred.var
is more than one feature:temp[, pred.var] <- pred.grid[i, pred.var]
It seems to be that the assignment of the
pred.grid
to adata.matrix
which is then subset to a single row somehow transforms this into a single array which is used to populatetemp
by column rather than row. Using the example above, my dataset is meant to look likewhen running the first grid point (i.e i =1 in the
foreach
loop). But what I'm seeing when stepping through is:Is this a known issue or am I doing something wrong when feeding in the parameters to
partial
?