Add standard deviation to PIG plots in regression case
Task: Adding standard deviation in each bin to PIG plots.
Task Description
With minor modifications to the code, it would be possible to include standard deviations to PIG plots. In the case of regression, it could be useful to visually inspect what the variance is within one bin. I would suggest to not plot it by default, but make it available to the users who want (using a specific argument)? Output could then look like this:
In the function compute_pig_table, the aggregation should be extended to also calculate the standard deviation as follows:
res = (basetable.groupby(predictor_column_name)
.agg(
avg_target = (target_column_name, "mean"),
pop_size = (target_column_name, "size"),
std_dev_target = (target_column_name, "std"),
)
.reset_index()
.rename(
columns={predictor_column_name: "label"}
)
)
And in the function plot_incidence, you can use ax.errorbar (with yerr being half the previously calculated standard deviation) instead of ax.plot.
Add standard deviation to PIG plots in regression case
Task: Adding standard deviation in each bin to PIG plots.
Task Description
With minor modifications to the code, it would be possible to include standard deviations to PIG plots. In the case of regression, it could be useful to visually inspect what the variance is within one bin. I would suggest to not plot it by default, but make it available to the users who want (using a specific argument)? Output could then look like this:
In the function compute_pig_table, the aggregation should be extended to also calculate the standard deviation as follows: res = (basetable.groupby(predictor_column_name) .agg( avg_target = (target_column_name, "mean"), pop_size = (target_column_name, "size"), std_dev_target = (target_column_name, "std"), ) .reset_index() .rename( columns={predictor_column_name: "label"} ) )
And in the function plot_incidence, you can use ax.errorbar (with yerr being half the previously calculated standard deviation) instead of ax.plot.