adjust query of reward during training

nick-harder commented 9 months ago

-before it got mean of all rewards -now it is per unit which is better

codecov[bot] commented 9 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (357f0ba) 78.44% compared to head (a1314f7) 78.45%.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #256 +/- ## ======================================= Coverage 78.44% 78.45% ======================================= Files 39 39 Lines 4259 4260 +1 ======================================= + Hits 3341 3342 +1 Misses 918 918 ``` | [Flag](https://app.codecov.io/gh/assume-framework/assume/pull/256/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=assume-framework) | Coverage Δ | | |---|---|---| | [pytest](https://app.codecov.io/gh/assume-framework/assume/pull/256/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=assume-framework) | `78.45% <100.00%> (+<0.01%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=assume-framework#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

kim-mskw commented 9 months ago

I am not quite sure if that is gae theoretic the smartest. We aim to have the highest sum of all rewards not the highest average per unit.

nick-harder commented 9 months ago

Why highest sum of all rewards? We want to perfrom each unit as good as possible. In current approach one unit performing really well outshines all other units which haven't learned anything. For example nuclear can earn a lot without much effort and has a huge reward, while others didn't learn much, and their reward is lost.

nick-harder commented 9 months ago

also, I have learned, that taking the max reward is not anywhere close to the equilibrium point. We should introduce some mechanism in the future which checks chnages in rewards per unit and exits if no changes in behavior were observed for some period of time.

kim-mskw commented 9 months ago

Mhhh, I do not see that according to the game theory the Nash Equilibrium is when the overall welfare is the highest. Since we have a fixed demand that equals the production rent. If the overall welfare (so the absolute sum) is higher when the nuclear plant earns a shit ton of money and the rest do not earn anything than this is the Nash Equilibrium disregarding of the fairness of the result.

nick-harder commented 9 months ago

@kim-mskw I don't agree with this definition. Maybe it is the case for some particular designs, but not for a general market setup. NE is when noone deviates from their policy. So ultimately we should have such condition for MADRL setups. But for now I believe the average reward of agents is a better representation compared to sum of all rewards

kim-mskw commented 9 months ago

@nick-harder after our bilateral talk I thought about that a lot. You are right the Nash Equilibrium (or one of the multiple) is not the state where the sum of all profits/rewards is maximal, but neither is it when the average profits/rewards of all units are the highest. I mean both are approximations. I could not find evidence in the literature which hints in multi-agent reinforcement learning which metric to rather use, frankly.

I mean with the mean we just divide the sum by the quantity of agents right now. So I came to the conclusion, it should not make any difference anyhow. Hence, my initial thought of it needing to be sum was wrong.

assume-framework / assume

adjust query of reward during training #256

Codecov Report