Open nick-harder opened 2 weeks ago
I ran several tests with example_02c, which already includes 10 learning agents that provide over capacity to the market. Currently, they do not learn anything and resort to always bidding the highest price for p_flex and p_inflex. One can force them to bid lower by increasing the regret_scale to 1. This is however not the wanted behavior.
For reference: regret scale = 0.9 we are not quite at the level of their marginal costs as found in Nicks paper
I changed the foresight based on the results of the Shapley Values for example 02b and 02c. For example, in this graph you can see that the most immediate forecasts have the highest influence while the others decrease in influence rapidly:
Tests for changed foresight and hindsight:
example 02a That is not as wanted
example 02b Bids Rl units in example 02b <- GOOD Prices in example 02b <- Could be BETTER, but are in line with the foregoing tests where behavior was stochastic and sometimes higher and lower than Harder et al. results
example 02c The change of the observation space to 8 hours forecast and 8 hours hindsight on prices does not change the behavior, unfortunately:
In depth analysis of 02c:
power plants leanr to bid on different price levels. The highest price PP is never activated and yet it does not bid lower, that speaks for not having learned.
The only difference to Nicks case is the training over one year, otherwise neither my or Nicks debugging have found any error in the learning.
To validate and measure the effectiveness of recent improvements in the learning process, a comprehensive large-scale simulation run is necessary. This will help ensure that the enhancements not only improve performance in theory but also in practical, real-world scenarios.
Task Objectives:
Execute the Simulation:
Analyze Results:
Address Any Issues:
By conducting this large simulation run, we aim to thoroughly test all improvements in the learning process, ensuring they translate into tangible performance gains and maintain high-quality learning outcomes in a real-world setting.