ds4dm / ecole

Extensible Combinatorial Optimization Learning Environments
https://www.ecole.ai
BSD 3-Clause "New" or "Revised" License
325 stars 68 forks source link

Extract reward before observation #209

Closed gasse closed 3 years ago

gasse commented 3 years ago

In-between two environment steps, the following happens:

From the perspective of an agent, if it can control the observation computation process as well as the action decision process, then those two operations can actually be fused into one, where the observation function directly returns the action which is to be taken.

Consider the following scenario: I use a configuring environment with the solving time as a reward. I want to discard the initial reward, as I do not want to account for the time it takes to load the instance file into SCIP. With the code as it is now, this will also discard the time it takes to compute the initial observation. As such, if I drop the initial reward, smart users might move all the computations required to evaluate their policy into the observation function, which will not be accounted for in the final reward.

The two operations (oi, ai) should be accounted for jointly in-between two environment steps. For example, think of the configuring environment. Previously the time was measured as follows:

It makes more sense, for credit assignment, to account for the time it takes to compute an observation and take a decision with respect to that observation within the same time period:

dchetelat commented 3 years ago

Yes, that makes sense. I'm ok with this.

Only minor thing, I have another PR where I'm playing with this exact same code. I think there will be a merge conflict: do you want me to do your suggestion in my PR or we keep them separately and we fix the conflict later?

dchetelat commented 3 years ago

Or maybe let's merge this PR first, I need to rebase mine anyways I think.

jdumouchelle commented 3 years ago

Good idea, I agree that this way of computing makes more sense.

gasse commented 3 years ago

Merged. Thanks !

AntoinePrv commented 3 years ago

The same should be done in the C++ file.

gasse commented 3 years ago

Hmmm good point.