Test all improvements of the learning process by performing a large simulation run

nick-harder commented 2 weeks ago

To validate and measure the effectiveness of recent improvements in the learning process, a comprehensive large-scale simulation run is necessary. This will help ensure that the enhancements not only improve performance in theory but also in practical, real-world scenarios.

Task Objectives:

Execute the Simulation:
- Run the large-scale simulation using both CPU and GPU configurations.
- Monitor the simulation process to capture performance metrics and identify any issues that arise during execution.
Analyze Results:
- Collect and analyze the performance data, comparing it to baseline measurements taken before the improvements were implemented.
- Evaluate the learning outcomes to ensure that the quality and accuracy have been maintained or enhanced.
Address Any Issues:
- Identify any new issues that emerged during the large-scale run and propose solutions.
- Make necessary adjustments and re-run simulations if required to ensure robustness and reliability.

By conducting this large simulation run, we aim to thoroughly test all improvements in the learning process, ensuring they translate into tangible performance gains and maintain high-quality learning outcomes in a real-world setting.

kim-mskw commented 1 day ago

I ran several tests with example_02c, which already includes 10 learning agents that provide over capacity to the market. Currently, they do not learn anything and resort to always bidding the highest price for p_flex and p_inflex. One can force them to bid lower by increasing the regret_scale to 1. This is however not the wanted behavior.

For reference: regret scale = 0.9 we are not quite at the level of their marginal costs as found in Nicks paper

kim-mskw commented 1 day ago

Changed Observation Space

I changed the foresight based on the results of the Shapley Values for example 02b and 02c. For example, in this graph you can see that the most immediate forecasts have the highest influence while the others decrease in influence rapidly:

Tests for changed foresight and hindsight:

example 02a That is not as wanted

example 02b Bids Rl units in example 02b <- GOOD Prices in example 02b <- Could be BETTER, but are in line with the foregoing tests where behavior was stochastic and sometimes higher and lower than Harder et al. results

example 02c The change of the observation space to 8 hours forecast and 8 hours hindsight on prices does not change the behavior, unfortunately:

kim-mskw commented 9 hours ago

In depth analysis of 02c:

power plants leanr to bid on different price levels. The highest price PP is never activated and yet it does not bid lower, that speaks for not having learned.

The only difference to Nicks case is the training over one year, otherwise neither my or Nicks debugging have found any error in the learning.

assume-framework / assume

Test all improvements of the learning process by performing a large simulation run #362

Changed Observation Space