I want to build an RL algo that will understand the concept of beating a benchmark (say S&P500), at a tic level. So if a tic is constantly beating the benchmark, the algo should prefer to pick that tic more often, versus a tic that keeps losing to the benchmark.
How should I make this happen?
Can I setup a feature that keeps checking on monthly basis, if a tic beat the benchmark and sends this as a signal to the RL algo? It could be a binary or a numeric feature (delta between tic and benchmark monthly return). But even then this, will be just a feature and is not really altering the reward signal. How do I alter the reward signal to achieve this?
I want to build an RL algo that will understand the concept of beating a benchmark (say S&P500), at a tic level. So if a tic is constantly beating the benchmark, the algo should prefer to pick that tic more often, versus a tic that keeps losing to the benchmark.
How should I make this happen?
Can I setup a feature that keeps checking on monthly basis, if a tic beat the benchmark and sends this as a signal to the RL algo? It could be a binary or a numeric feature (delta between tic and benchmark monthly return). But even then this, will be just a feature and is not really altering the reward signal. How do I alter the reward signal to achieve this?