Alessiobrini / Deep-Reinforcement-Trading-with-Predictable-Returns

Reinforcement Learning framework to make synthetic experiments in the financial domain
21 stars 11 forks source link

Module for behavioral cloning #2

Open Alessiobrini opened 3 years ago

Alessiobrini commented 3 years ago

Insert a module after the loss computation of the algorithm that perturb the parameters by doing behavioral cloning from an expert (Garleanu and Pedersen solution)

Alessiobrini commented 3 years ago

Following from this paper At a first glance, we should:

Possible issues:

Alessiobrini commented 3 years ago

I noticed that the log loss as described in the paper assumes that you use again you actor to compute the action given the states in the batch. I can do the same by doing another forward pass over the Qnet.

Alessiobrini commented 3 years ago

There are two ways to do this knowledge injection into the training process:

Then the choice of the loss determines what we want to do exactly:

Alessiobrini commented 3 years ago

Currently implemented solution with mse loss added to the total loss with a rescaling factor. Testing phase now.

Alessiobrini commented 3 years ago

Added the implementation also to the MisspecDQN case

Alessiobrini commented 3 years ago

Need to be added to PPO algorithm