ds4dm / ecole

Extensible Combinatorial Optimization Learning Environments
https://www.ecole.ai
BSD 3-Clause "New" or "Revised" License
325 stars 68 forks source link

Reusing data functions lead to wrong results #197

Open AntoinePrv opened 3 years ago

AntoinePrv commented 3 years ago

Describe the bug

Doing something like

r = ecole.reward.XXX()
ecole.environment.Branching(
    reward_function=(r + r**2),
)

Or

r = ecole.reward.XXX()
ecole.environment.Branching(
    reward_function=r,
    information_function={"r": r},
)

Might potentially give an error, or worse give wrong results, because data functions assume before_reset / extract to be called once per episode / transition.

Expected behavior

I think making the previous example work as expected is a better option than forbidding reusing function. Even if documented, the latter would still leave the door open to silent bugs.

One solution could be that extract (and before_reset) recieve an id, like transition_number, and be required to be a no-opt and return the same when a same transition_number is given again.

Additional context

Alternatively, the environment could have made cached the output of data function, but this does not work in the case of a formula because the data function are captured privately (they could be deep-copied though).

AntoinePrv commented 2 years ago

Two ways this could be handled: