Open billdenney opened 1 year ago
A similar issue is happening with prediction corrections.
The issue appears to be related to these lines where the data are assumed to be sorted identically between the observed and simulated data:
I had not filtered for MDV == 0 in my data. (That's on me.) Is there a way that we can help users by assisting with this filtering or otherwise helping make sure that the observed and predicted data line up?
I had a thought to help make sure that the data line up. We could:
stopifnot(all(data[[o$xobs_name]] == obs$x[repl]))
Thanks for the PR's @billdenney, I'll be reviewing/testing today and will likely be merging all. Regarding issue with ordering of data, yes, we do have requirements for observed/simulated data pre-processing and put the onus on the user to ensure data is arranged correctly and filtering of MDV == 0: https://certara.github.io/tidyvpc/#data-preprocessing
I think your solution to ensure that the data lines up is indeed viable though.
@certara-jcraig Thanks! Hopefully, I didn't overload you here. When I get excited about a new tool, I jump in with both feet first! 😄
@billdenney haha, no problem - we are happy to have your contributions here along the road to make tidyvpc the 'go-to' package for VPC's in R! Thanks again for your help.
please note that we can do "external vpc" with tidyvpc i.e use a previous simulation from a model and contrast with newly observed data that is why we allowed simdata to be a non replicate number of the obsdata. also when you filter blq on obs and sim there is no guarantee that sim will end up being a multiple of the obs
thanks for the comment @smouksassi , good point - @billdenney looks like we should in fact replace the stop() with warning() as to support use case Samer mentions
@smouksassi and @certara-jcraig, thanks for clarifying that use case. It's an important one, but it's not clear to me how the current code would align the replicates correctly in that case. The prior code used the x-values from the observed data, unless I'm missing something. It seems like we would need a different method of assigning x-values for simulated data for that use case to work. If I'm missing something, please let me know.
I agree we should make a use case and give it more thought. for now we are binning the x from observed and forcing it to be the same on the sims (but will have to check)
I've been thinking about this a bit more, and I'm not certain how @smouksassi's use case functions. To be sure that we're all clear, my interpretation of the two use cases described here are:
The first case above works with the current tidyvpc
code, and it needs to be maintained with the current functionality.
As I'm thinking more about the second use case, it's not exactly clear how the simulated data would be used for comparison without regenerating the simulations. As a simple example, if a model were built on single-dose data and the new data were multi-dose, then a multi-dose simulation would be required for the VPC to be a reasonable comparison. In that case, the current workflow would still function for comparing an old model to the current data; the VPC simulations would need to be updated with the current data.
Are you thinking of something else @smouksassi ?
hi @billdenney I gave it a second thought yes I agree if user wants to do an external VPC we still need to vit a max iteration = 0 run to get the PRED and the design of the observed data so we know how to bin split etc.
now back to my other concern: https://github.com/certara/tidyvpc/issues/22
in the original pcvpc paper discussion one of the method is to omit BLQ similarly on obs and simulated and then do pred correction that was the spirit on why we allowed pred correction with filtering of blq but as per the issue above the argument was never ported and got lost in translation :D
workaround I am doin gnow was obsdata %>% filter(DV > BLQ) simdata%>% filter(DV > BLQ)
the above there is now guaranteed tha the sim data is a multiple of obs because simulation mode can vary how many blq we will get and then we cannot use tidyvpc anymore
we can keep it as warning or reintroduce the ability to filter blq later in the pipeline
@smouksassi , Thanks for thinking it through with me. In that case, should we close this issue and focus the BLQ/ALQ discussion in #22?
I think we should discuss on not allowing the user to do ped corrected vpc without filtering out the blq or allow and warn either way we need to allow the filter blq ( see discussion in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3085712/ I think the latest psn will allow and warn when user do pred correction with M3 but need to test
Ways to handle data censored due to limit of detection in VPCs have previously been suggested ([6](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3085712/#CR6),[21](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3085712/#CR21)). These approaches are not directly applicable to pcVPCs and pvcVPCs since the observations lose their rank order after prediction and variability correction. Observations censored due to limit of detection will therefore need to be handled in the same way as previously suggested for drop-out censoring of observations ([22](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3085712/#CR22)). This approach is based on omitting censored observations (e.g., BQL) in a similar manner for both the observed and simulated datasets before calculating the VPC statistics. In this case, the interpretation of the pcVPC/pvcVPC for the continuous observations will be heavily dependent on an accurate prediction of the data censoring. Therefore, diagnostics ensuring accurate prediction of censoring frequency (e.g., percentage of observation below LLOQ) versus the independent variable will be important before assessing the pcVPC/pvcVPC for the continuous observations.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2691472/
`
In the example below, all simulated values are reasonably good, but the simulated results are far from the expected values. My guess is that there is an error creating the bins in the simulated data.
Created on 2023-07-17 with reprex v2.0.2