Use same case for random variables as function arguments

causalincentives / pycid

Library for graphical models of decision making, based on pgmpy and networkx

Apache License 2.0

96 stars 13 forks source link

Use same case for random variables as function arguments #53

Closed tom4everitt closed 2 years ago

tom4everitt commented 2 years ago

Currently, the below doesn't work:

cid = pycid.CID([('X', 'D'), ('X', 'U'), ('D', 'U')], decisions=['D'], utilities=['U']) cid.add_cpds( X=pycid.discrete_uniform([0, 1]), U=lambda X, D: int(X == D), D=[0, 1])

because U=lambda X, D: int(X == D) has to have X and D in lowercase. This seems counterintuitive.

Two possible fixes:

Make arguments case-insensitive, so either lower or upper case are fine
Allow only arguments that match the case of the variable name

Some considerations: Option 1 has the benefits of backwards compatibility, and it also lets people use lowercase arguments for functions (which PEP requires). Option 2 would make the implementation a bit simpler, and let people use "X" and "x" for two different variables.

tom4everitt commented 2 years ago

@edlanglois any thoughts?

edlanglois commented 2 years ago

What if the CPD lambda takes a dictionary instead? lambda p: int(p['X'] == p['D']) That way it will work with node names that aren't valid identifiers like numbers or names with spaces.

edlanglois commented 2 years ago

Of the two options you described, I prefer 2. There are already variables that can't be expressed in the named-argument version so I don't think we should have the complexity/ambiguity of lower casing to avoid a PEP error for some of the remaining variables. Users can silence it explicitly if they use capitalized variables.

It's possible that I'm leaning too hard into purity over convenience though so 1 is OK too. It would still be an improvement over the current situation.

tom4everitt commented 2 years ago

What if the CPD lambda takes a dictionary instead? lambda p: int(p['X'] == p['D']) That way it will work with node names that aren't valid identifiers like numbers or names with spaces.

I think it's always possible to do this, you can just use lambda p**: int(p['X'] == p['D']) instead.

Probably the best is to switch to option 1, and add a deprecation warning when people mismatch the case. After a while we can clean up the code by switching to 2, if that still seems like the right move.

edlanglois commented 2 years ago

Ah, I didn't realize using **p in the definition would allow keys that weren't identifiers. Sure, that plan seems reasonable then.

tom4everitt commented 2 years ago

Fixed now. In the end I went with option 1, breaking backwards compatibility to avoid the code complexity associated with 2. commit: e50ee06b7eafac63fe7c9471764c9c5774fc743b