Closed nick-harder closed 2 weeks ago
Attention: Patch coverage is 91.30435%
with 8 lines
in your changes missing coverage. Please review.
Project coverage is 77.96%. Comparing base (
1b430ff
) to head (ce8c055
).:exclamation: Current head ce8c055 differs from pull request most recent head 23a7ac9
Please upload reports for the commit 23a7ac9 to get more accurate results.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
To be able to simulate several markets using one single RL agents we need to be able to save the rewards only after all the modeled markets have been cleared and the final reward has been calculated. This PR brings the required changes to the framework by scheduling the saving to the buffer and policy updates every train_freq. If train_freq is configured to match the closure of all modelled markets, the correct rewards is sved, thus enabling to model several market with one RL agent.