Reward shaping in RL algorithms using benchmark returns

I want to build an RL algo that will understand the concept of beating a benchmark (say S&P500), at a tic level. So if a tic is constantly beating the benchmark, the algo should prefer to pick that tic more often, versus a tic that keeps losing to the benchmark.

How should I make this happen?

Can I setup a feature that keeps checking on monthly basis, if a tic beat the benchmark and sends this as a signal to the RL algo? It could be a binary or a numeric feature (delta between tic and benchmark monthly return). But even then this, will be just a feature and is not really altering the reward signal. How do I alter the reward signal to achieve this?

AI4Finance-Foundation / FinRL

Reward shaping in RL algorithms using benchmark returns #1220