Closed kim-mskw closed 1 month ago
okay I know why it is always the same, because it somehow now always loads old policies in a dead folder. I am fixing that. This is terribly confusing.
so there was a function in the learning_strategies that loaded policies and one in the learning_algo, leading to very weird behaviour after the load_scenario function. Basically in the eval run always an old policy from a folder was read.
Technically we need the actor in the bidding strategy, because otherwise it is not initialized if we do not use learning, so learning mode off and hence no algo as well. This double initializing is weird though.
Attention: Patch coverage is 78.78788%
with 7 lines
in your changes are missing coverage. Please review.
Project coverage is 78.07%. Comparing base (
ef8da6f
) to head (3e997f1
).
Files | Patch % | Lines |
---|---|---|
assume/reinforcement_learning/learning_role.py | 53.84% | 6 Missing :warning: |
assume/scenario/loader_csv.py | 94.11% | 1 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
so the old save policies function did not really use max_eval. It was always set back to -1000 when the learning role was wirtten new. I changed that now and added an early stopping criterium. Something works better but something weird as well, because the max avg metrik is now always 19.28
This is the basis for the discussion with @nick-harder ick- tomorrow