Closed nick-harder closed 3 months ago
Attention: Patch coverage is 92.52336%
with 8 lines
in your changes missing coverage. Please review.
Project coverage is 78.31%. Comparing base (
1b430ff
) to head (1cb7765
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@maurerle @kim-mskw it would be great if you please could review this PR today so I could continue working on the sequential market testing and implementation. Thanks in advance!
I looked over the code, some things are still unclear to me. I think it would be best if kim tests this branch as well to see, that everything gives reasonable results/still works?
Thanks! I have tested learning for single agent and multi agent (2a and 2b) and everyhting works for me. But yeah, it would be great to get a second look since this brings many changes
To be able to simulate several markets using one single RL agents we need to be able to save the rewards only after all the modeled markets have been cleared and the final reward has been calculated. This PR brings the required changes to the framework by scheduling the saving to the buffer and policy updates every train_freq. If train_freq is configured to match the closure of all modelled markets, the correct rewards is sved, thus enabling to model several market with one RL agent.