bluefoxr / COINr

COINr
https://bluefoxr.github.io/COINr/
Other
22 stars 7 forks source link

Sensitivity Analysis warnings #60

Open OscarSmallenbroek opened 1 week ago

OscarSmallenbroek commented 1 week ago

Hi Will,

We noticed the sensitivity analysis skips certain steps that are in the sensitivity analysis specifications without clearly communicating that to the user. It is likely that the SA falls back on defaults but it is not clear to the user when this happens.

For example, when you specify weights in the get_sensitivity() you get a warning: Address $Log$Aggregate$w is not currently present in the coin or else is NULL. Continue anyway (y/n)? The SA does implements weights in this case. Thus we assumed that this message is to tell the user, the original coin does not have weights but it will be implemented in the SA as specified.

However, we found when you make a coin without imputation and then run SA with imputation alternatives you get this message: Address $Log$Impute is not currently present in the coin or else is NULL. Continue anyway (y/n)? But the SA does NOT implement an imputation strategy. Likely because the default settings of Aggregate point to the Normalised dataset and not Imputed dataset. Yet, when we do specify the dset = Imputed in Aggregate() alternatives in SA_specs, then we get an error that dset "Imputed" is not found in the original coin and the SA does not run.

We just like to flag this inconsistency in messages and warnings. We understand that we have to run Impute( coin, disable=TRUE) on the original coin to get the SA to work.

We suggest that all the SA should not run if the dataset is not present in the coin or there is no default alternative, and to make "dset = " mandatory for all the SA specifications. That way the users have to explicitly specify which operations are done and in what order, and it prevents the SA from running without telling the user it is skipping some calculations.

Best, Oscar

bluefoxr commented 1 week ago

Hi Oscar, thanks for flagging this. I think the issue here is that sensitivity analysis is complex and involves regenerating the coin based on the function arguments stored in the log file. What actually happens is that the sensitivity analysis calls Regen() and runs the functions it finds stored in the Log part of the coin, in the order that they appear, with the arguments it finds there. The sensitivity analysis is effectively editing the Log part of the coin.

The thing is that for each coin generated in the sensitivity analysis you need to end up with a coherent coin that does what you want it to do, and in some cases this is going to require quite some care from the user. I think probably what is happening with the imputation in your case is that the operation is being added to the end of the Log list, since it doesn't already exist, so it is actually run after the aggregation. I don't think that forcing users to specify the data set will fix this problem though, since Regen() will always follow the list order. Plus, enforcing hard constraints like that from the SA is going to be complicated to get right.

Not sure what to suggest here but if you send me a reproducible example I can take a look and have a think about it. My feeling though is that this is just a product of the complexity of the sensitivity analysis - since COINr allows a lot of flexibility in changing parameters it comes at the price of needing to be quite careful and understanding what is going on inside.

OscarSmallenbroek commented 1 week ago

Hi Will,

Thank you for the clarification. I will take a closer look at Regen to understand what is going on.

I am attaching the R script I used to verify the issue. Making good use of your very helpful build_example_coin().

To highlight the complexity of the sensitivity analysis, Panos suggested the following: I think that changing the vignette and the details to add that “all the steps tested in the SA/UA should exist in the original coin as well (so do the disable thingy above to be sure)” would be a compromise. It will let people know that they need to model the original coin like this for the SA to work properly.

We just figured out this aspect of the sensitivity analysis and I am sure I have done at least one audit by building a coin that did not have imputation but then used imputation in SA (whoops). Of course it is my fault for not checking everything properly.

The larger issue is that we are not sure whether SA does what we think it does. To address this, I thought we could modify get_sensitivity() to save descriptive statistics of each dataset for each version of a coin (with the nominal weights) so we can check if things have been implemented as expected.

All the best, Oscar

On Tue, Jun 18, 2024 at 11:06 AM Will Becker @.***> wrote:

Hi Oscar, thanks for flagging this. I think the issue here is that sensitivity analysis is complex and involves regenerating the coin based on the function arguments stored in the log file. What actually happens is that the sensitivity analysis calls Regen() and runs the functions it finds stored in the Log part of the coin, in the order that they appear, with the arguments it finds there. The sensitivity analysis is effectively editing the Log part of the coin.

The thing is that for each coin generated in the sensitivity analysis you need to end up with a coherent coin that does what you want it to do, and in some cases this is going to require quite some care from the user. I think probably what is happening with the imputation in your case is that the operation is being added to the end of the Log list, since it doesn't already exist, so it is actually run after the aggregation. I don't think that forcing users to specify the data set will fix this problem though, since Regen() will always follow the list order. Plus, enforcing hard constraints like that from the SA is going to be complicated to get right.

Not sure what to suggest here but if you send me a reproducible example I can take a look and have a think about it. My feeling though is that this is just a product of the complexity of the sensitivity analysis - since COINr allows a lot of flexibility in changing parameters it comes at the price of needing to be quite careful and understanding what is going on inside.

— Reply to this email directly, view it on GitHub https://github.com/bluefoxr/COINr/issues/60#issuecomment-2175589695, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVZEHMPDQDBORGXTG665GR3ZH72BHAVCNFSM6AAAAABJOHDYBSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZVGU4DSNRZGU . You are receiving this because you authored the thread.Message ID: @.***>

bluefoxr commented 1 week ago

Yes maybe it makes sense as you say to at least issue a warning if in the SA you specify a parameter of a function that has not actually been used in the existing coin. I can do that.

The other thing I can do is enable a kind of diagnostic mode in which for each iteration, it actually saves the coin into a big list of coins. That way you can check in as much detail as you like what is going on.

bluefoxr commented 1 week ago

Hi @OscarSmallenbroek I added some things to address this as discussed above. Now, if you specify a parameter of a function that is not in the Log at all, it will issue a warning. Also added the diagnostic mode so you can check what is happening in the coins generated by the SA. See https://github.com/bluefoxr/COINr/commit/6074772e868e10220432f30a6e9b7b3407821e28