causalincentives / pycid

Library for graphical models of decision making, based on pgmpy and networkx
Apache License 2.0
96 stars 13 forks source link

random_cids sometimes hangs forever #45

Open edlanglois opened 3 years ago

edlanglois commented 3 years ago

It's likely the cause of these test failures: https://github.com/causalincentives/pycid/actions/runs/667798848 https://github.com/causalincentives/pycid/actions/runs/666392113 https://github.com/causalincentives/pycid/actions/runs/666341225 https://github.com/causalincentives/pycid/actions/runs/665993248

Those were from before I split up the test_random functions. When I test locally its test_random_cids_create_one that hangs.

edlanglois commented 3 years ago

Traceback from stopping the test:

  File "/home/eric/dev/pycid/pycid/core/cpd.py", line 219, in initialize_tabular_cpd
    [[complete_dictionary(self.stochastic_function(**i))[t] for i in self.parent_values(cid)] for t in domain]
  File "/home/eric/dev/pycid/pycid/core/cpd.py", line 219, in <listcomp>
    [[complete_dictionary(self.stochastic_function(**i))[t] for i in self.parent_values(cid)] for t in domain]
  File "/home/eric/dev/pycid/pycid/core/cpd.py", line 219, in <listcomp>
    [[complete_dictionary(self.stochastic_function(**i))[t] for i in self.parent_values(cid)] for t in domain]
  File "/home/eric/dev/pycid/pycid/core/cpd.py", line 207, in complete_dictionary
    missing_keys = set(domain) - set(dictionary.keys())
KeyboardInterrupt

I don't know the details of what it's trying to do but in my tests sometimes matrix in initialize_tabular_cpd will have shapes like (107, 4096) or (32, 5120) and take a long time to generate. When it runs quickly the shapes are more like (9, 80). The first index is card.

tom4everitt commented 3 years ago

Interesting. I've never noticed it failing locally, but I have seen the github actions version time out occasionally.

My guess is that the matrices get really large when a single node have many parents, because the number of possible parent outcomes grows exponentially with the number of parents. Probably we should add a max_degree parameter to random_cid, and avoid adding edges going into nodes with many parents.

tom4everitt commented 3 years ago

Alternatively/additionally, we can set the random seed in the test

tom4everitt commented 3 years ago

Also, I did just push an improvement to random_cpd which I think should generally lead to smaller matrices. So with a bit of luck, the problem has already been solved.