assume-framework / assume

ASSUME - Agent-based Simulation for Studying and Understanding Market Evolution
https://assume.readthedocs.io
23 stars 6 forks source link

Implement learning functionalities #3

Closed nick-harder closed 1 year ago

nick-harder commented 1 year ago

Start with the implementation of the learning functions

kim-mskw commented 1 year ago

Integration Learning Discussion 22.05

Base: Nicks Implementation of flexRL

Implementation Decisions

- Dynamic learning algorithm specififcation based on config choice of algorithm the init and update policy function should be set rest of code should work regardless of the algorithm

- Resuse written replay buffer in common of assume

- Coordination GPU & CPU as the choice of an action, alias applying a bidding strategy, is done by a neuronal net of the learning agent it runs on the GPU, the rest pf the simulation, however, is done on the CPU sending stuff from the GPU to the CPU takes quite some time, hence it is currently handeled in Nicks code, that all operational windows of all agents are collected and written from the GPU to the CPU

  1. One unit operator per GPU If learning is activated we have only one specific leanring unit operator that handels all learning units meaning the column unit operator is ignored if learning = 1 in the formulate bids from the operational window the transformation from GPU to CPU is then done

  2. Detach from unit Opertaor and integrate intermediate step handeling that (maybe later) we want to avoide more unneccesarry messaging, hence collecting and coordinating info is done in supplementary function which coellect data Discussion Points: How does function now if all data is received, if we have asynchronous data or can dynamically subscribe to markets? Could it be done in the market? Yes as long as we have only one GPU, so how to handle multiple GPUs?

- Forecast Generation for Observation Space in the future we want to have forecasting role handling the different needed forecasts and sending it to the agents, so that we can have diverging forecasts (see respective issue) first we want to calcuate the expected merit order price according to input files and read it as an observation similar to the ahndling of the fuel_prices residual load forecast can be taken form smard/Entso-e tranparency or just perfect foresight

kim-mskw commented 1 year ago

Architecture Discussion to accommodate learning:

General

Learning role

Unit

Unit Operator

RL Strategy

RL Algorithms

maurerle commented 1 year ago

Relevant progress was made in #130 Kim is currently working on a functioning sampling method including the MATD3.

kim-mskw commented 1 year ago

Running version of the learning is in main. Yet, the learning itself, especially the update function does not work. I would suggest the following steps for the further process:

Visualisation

Get Learning to learn

Clean Learning

Implement Evaluation and saving

nick-harder commented 1 year ago

@maurerle this one is done as well or?

maurerle commented 1 year ago

Only the tests are missing which are also part of #143 - so yes, we can close this. I am currently doing the rl tests