causalincentives / pycid

Library for graphical models of decision making, based on pgmpy and networkx
Apache License 2.0
96 stars 13 forks source link

Bug in revised context workaround #33

Closed edlanglois closed 3 years ago

edlanglois commented 3 years ago

For the revised context workaround of MACIDBase.query it looks like the way pgmpy determines if you are passing indices is to check if any are not valid state names. As a result, this workaround will fail if it is ever the case that state_names are the numbers 0..n but not in order. Then pgmpy will think we're passing state names when really we're passing indices.

I don't see how to get around this with pgmpy as-is. What bug is this a workaround for?

Jamesfox1 commented 3 years ago

are you sure it fails for this....? I will check tomorrow and write a test for it

Jamesfox1 commented 3 years ago

pgmpy has a bug where Bayesianmodel.query throws an error when passing in state names which aren't the indices :( I've lodged an issue with them

edlanglois commented 3 years ago

I haven't actually tested it so I might be wrong. I noticed though that the warning pgmpy produces when it decides the state names are indicies is only produced in one test case: test_expected_utility when using "d" or "c" as values. So that suggests to me that it is interpreting the states as labels in all other cases.

tom4everitt commented 3 years ago

@Jamesfox1, could you link to the pgmpy bug that you filed for this?

It would be good to test if this actually fails. If it does, one proper workaround would be to before running a query, first create a copy of the BayesianModel, where all state_names are removed, and then run the query on that copy

edlanglois commented 3 years ago

I spent a bit of time earlier this week trying to diagnose the pgmpy bug. I can confirm that there is one: if you use context instead of revised_context then the test with d and c states fails. I haven't yet identified the cause though. I noticed that pgmpy creates a ton of copies of the CPDs during query and in some of those copies the state labels are gone. I'm suspicious of the operators like mul possibly not preserving labels if only one side has them but I haven't checked it out further.

tom4everitt commented 3 years ago

I implemented the fix that I proposed above for this.

Basically, make a copy of the CID that doesn't have any state_names, and run the query on that.