certara / tidyvpc

Package to Compute VPC Percentiles & Prediction Intervals Developed by Certara
https://certara.github.io/tidyvpc/index.html
Other
9 stars 6 forks source link

`predcorrect()` call ordering #43

Closed billdenney closed 1 year ago

billdenney commented 1 year ago

Hi @billdenney , regarding why we cannot use predcorrect() before binning(), the ypc calculation uses median(pred) by bin https://github.com/certara/tidyvpc/blob/8c7ed9c00ca9d60e19211d2731511bf0c2e05658/R/vpcstats.R#L597, and in your pcVPC when you specified predcorrect(), first, stratbin is NULL and the ypc calculation then incorrectly uses mpred value as median(pred) for the entire data. We did have a check for this but it was erroneously commented out when I made changes to allow binless() to be used before/after predcorrect().

If using predcorrect() + binning(), we could simply use predcorrect() to update the tidyvpcobj with required attributes e.g., predcor and actually perform the ypc calculation inside vpcstats(), this would allow users to specify the functions in whatever order (this is the case for predcorrect() + binless() when calculating l.ypc). However, the reason we don't do this is because users can actually plot the bins without the vpc statistics (https://certara.github.io/tidyvpc/articles/tidyvpc_cont.html#visualize-bins), and when predcorrect() is used, the resulting points on the plot are the obs$ypc values.

I've made some changes to predcorrect() usage as I described to you, deprecating additional usage of loess.ypc argument inside binless(), now binless() + predcorrect() will automatically perform LOESS pcVPC without needing additional redundant argument. I also now ensure binning() must be performed before predcorrect() if performing traditional pcVPC. Users can still call binless() after predcorrect() for backwards compatibility, but they will now receive a warning. See PR: https://github.com/certara/tidyvpc/pull/42

I'm not too happy about the additional warnings and implementation in general, but this is required to preserve existing behavior. We may decide to overhaul some of this for the next major version release. Happy to hear your thoughts.

Originally posted by @certara-jcraig in https://github.com/certara/tidyvpc/issues/36#issuecomment-1648547203

billdenney commented 1 year ago

My overall thoughts are that making things easier for users is usually better (unless it causes code to be very difficult to maintain).

So, doing everything nearly as late as possible or redoing work if calls are in the "wrong" order is my main thought. To maintain current functions (and I do see the value in plotting the bins), I think that my ideal would be:

If someone calls predcorrect() then binning(), a predcorrect flag is set in the object and prediction correction could be redone after binning.

I don't know if the original data is stored in a way to allow that, but I think that would be easiest for users. That would also prevent the need for any (or many) warnings.