CITCOM-project / causcumber

Cucumber driven causal inference for testing computational models.
1 stars 1 forks source link

Semi-automatic generation of causal graphs should consider input-input relationships for observational data #9

Open AndrewC19 opened 3 years ago

AndrewC19 commented 3 years ago

When we are running the model freely for testing purposes, we make the assumption that the inputs used for each execution are independent of each other.

For example, if we want to isolate the effect of introducing the pfizer vaccine, we would simply run the model two times: once with a pfizer vaccine and once without, making sure that nothing else is changed. As a result, we can be sure that any difference is caused by the vaccine (or some non-determinism, but this is dealt with by repeats).

However, when we use observational data to predict the outcome of a test case that we have not executed, we cannot make the same assumption. Here it could be the case that inputs are associated.

For example, if I used previous execution data in which a vaccine was implemented to be widely available in the UK but not in France, then the vaccine parameter is dependent on the location. If location also has an effect on an outcome such as cumulative number of deaths, this would induce confounding that must be controlled for.

In short: we should give users the ability to add edges between inputs if they are using the observational data tag.

jmafoster1 commented 3 years ago

Should we implement this as "give users the ability to add edges between inputs" (for example with an additional Given like when pruning edges) or should we implement it as "force users to consider edges between inputs" by making the inputs a connected substructure in the graph from which them must prune edges? In the latter case, this would only happen when using the @observational tag

AndrewC19 commented 3 years ago

I can't think of any circumstances where you would have relationships between the inputs if you are running the system freely to test it. I could be wrong though.

Based on that, I think the best way to handle this would be to make the inputs fully connected if an @observational tag is used.

What are your thoughts?

neilwalkinshaw commented 3 years ago

Wouldn't you want to leave the possibility open though Andy? I suppose it depends on what you mean by "running the system freely to test it" (perhaps I am misinterpreting).

My line of thought is that you might want to, for a given test, incorporate the notion that the values of one parameter are conditional on the values of some other parameter. This interdependence between parameters is quite common in normal configurable software, and you can easily envisage interdependencies between input parameters that could have a bearing on the output.

The existence of such interdependences forms the basis for combinatorial testing. I'd expect these sorts of relationships to be prevalent in computational model testing?

AndrewC19 commented 3 years ago

In that case it’s probably best to produce a fully connected causal graph (including inputs) for both observational data and when freely running the model.

We should find an example of this kind of interdependence in a Covasim scenario.

neilwalkinshaw commented 3 years ago

As a starter, I suppose there are the trivial ones (but they matter from a testing perspective) - pop_size and pop_infected - you can't have pop_infected > pop_size. Or you can't have new_infections > cum_infections. I.e., your choice for one will immediately constrain your choice for the other.

Even if they don't make sense from my quick scan of the input docs, you get the gist.

Now, whether (or when) this should count as "causal" in our context is perhaps something we should discuss. In some cases there is a clear link - e.g. setting a boolean on variable x meaning that variable y will be read, otherwise it won't.

However, in the actual covasim examples I've picked above, the relationship is a bit more ambiguous.

AndrewC19 commented 3 years ago

The relationship between pop_size and pop_infected makes sense to me, but I'm not sure whether it's causality or not. Does pop_size actually cause pop_infected?

When we change pop_size we don't change the value of pop_infected, but we set an upper limit to its value. There's also the issue of which variable causes which. You could equally consider a case where the model is designed around an inital pop_infected so the pop_size must be greater.

I agree that we should discuss this and maybe get Nick's opinion on the issue.

jmafoster1 commented 3 years ago

I think causality can certainly be argued here, since changing one affects the possible values of the other. The causal relationship is just not smooth --- it's more of a step function of valid/invalid values --- and it doesn't really make much sense to try and estimate one from the other. Which causes which definitely depends on what you're doing, so you'd have to draw your causal model accordingly.

On the other hand, the "causality" probably isn't going to make that much difference to the actual functioning of the model in this case, and you could more or less argue independence. The initial infected is independent of the population size until it isn't, if that makes sense, but the model will not run for invalid configurations so you're only going to get data from one of the "treatment groups" (valid configuration).

Going back to the original issue, I'm personally in favour of having an extra Given which allows users to add edges like they can prune them now. Inputs are mostly independent, so users would have a lot of edges to prune if we gave them a fully connected input substructure. It wouldn't be that difficult to implement both options, which the user could then control in their DAG drawing scenario, but then we're getting dangerously close to "here's a specific subset of Gherkin that we support". Really, the user needs to take responsibility for their own DAG. They're so specific that I don't think it's possible to come up with an "easy" way of drawing them.

neilwalkinshaw commented 3 years ago

I agree with this.

I also think it's worth drawing attention to the subtle difference in objectives between the objectives of the modeller and the tester.

You're right that the step-wise nature of the causal relationship probably won't matter to the modeller, who just wants to find out realistic things (e.g. how the vaccine rollout in Japan will be affected by new variants given current populations).

But then there's the tester. The tester does care about these relationships. They would care about the extreme cases - e.g. "Does the model behave as expected if we have a pathological scenario where pop_size == pop_infected"? This is why having these edges in could be important.

We need to try to ensure that we are looking at the model from the mindset of the tester, who wants to do all the sanity-checks / establish metamorphic relations, etc.