AminHP / gym-anytrading

The most simple, flexible, and comprehensive OpenAI Gym trading environment (Approved by OpenAI Gym)
MIT License
2.1k stars 465 forks source link

Alternate way of training the agent to reduce training time (Suggestion/Discussion) #20

Closed sword134 closed 4 years ago

sword134 commented 4 years ago

Hello, I've been thinking whether or not it would be possible to integrate this kind of feature in gymanytrading, basically, it goes like this:

The problem

From what I know about RL the policy gradients are initialized randomly and the agent is rewarded according to its actions, within trading this potentially means that it can take millions of iterations across the dataset before it even comes up with a strategy that is remotely successful, thereafter it spends time optimizing the strategy which again can take a long time. In the end, you are presented with a model that is attempting to maximize its reward, based on the reward structure this can mean that if you are maximizing the total net worth you might end up with a model applying a scalping strategy, where you personally would have liked a model that was swing trading. So how can we potentially adjust for this and also make it faster in the process? Normally you would add several reward functions to reward it based on what kind of strategy you want it to employ, however, there might be another way.

The solution

A theoretical concept that I have been messing around within my head is that if we instead of define reward functions by networth, sortino ratio etc. we should perhaps go in our dataset and place buy and sell markers either manually or mathematically. These buy and sell markers are where you want the RL to ideally enter and exit trades, therefore we reward only the RL when it trades at these points/prices. The obvious concern here is overfitting: First, we have to address the fact that in all forms of ML you have to split your dataset into training and testing this will allow people to see whether or not it's actually overfitting on the training dataset. Second of all, a precision parameter could be set, this parameter would determine the range in percent from the buy and sell prices specified which we would still reward the RL for buying and selling at.

An example

Take a daily chart of Apple, if we are going to apply a swing trading strategy, the perfect buy entry would occur on the 23rd of March 2020 at price 212,61$. We would mark this as our buy entry. The perfect sell exit would occur on the 13th of July 2020 at a price of 399,82$, we then mark this as the sell exit. You would keep doing this either manually or mathematically across the entire dataset on which you want to train on. Next, we would set a precision parameter in this case we set it to 1,5 meaning that we will still reward the agent if it buys at a price of +-1,5% from 212,61$ and sells at a price of +-1,5% from 399,82$.

The impact on the model

So how would this impact our RL? My theory is that our RL agent will be trying to create a strategy that generates entry and exit signals according to these buy and sell points, this will allow for more controllability for the user who can now specify what strategy they want to employ (swing trading, scalping etc.). Besides controllability, the RL would presumably train faster since it doesn't need to find out which entry and exit points generates the most reward (we did that for it), therefore it would instead spend its time going over the dataset to find signals that would trigger withing the precision range of our entries and exits, of course, if these signals also trigger outside the range it gets punished for it so as to avoid it just constantly generating buy and sell signals on each bar.

Final words

This is of course just my take on things and I am posting it here because of two reasons. Number 1 being that this could potentially become a unique feature only in gymanytrading (I haven't seen this elsewhere) given that the other reason I'm posting will hold its ground. Reason number 2 is to get feedback on this idea, I am still fairly new to RL so if some of the experts out there think that this won't work because of A, B, C or D well then comment here and let's get a discussion going. After all it is in everyone's best interest to get gymanytrading to be as good as possible, even if it means implementing new ideas that might not have been tried before since they could turn out to be the best ideas.

AminHP commented 4 years ago

Hi @sword134.

I have a question. Why do you want to stick with RL methods when you have tags for your dataset? Isn't it a better idea to use supervised algorithms in this manner? I mean we are somehow training RL agents to find tags for us. What's the point of using RL when we have those tags?

sword134 commented 4 years ago

@AminHP Let me ask you this. When we are training our RL to find tags for us its a matter of training it to give us buy or sell signals before we know the end result (live trading), by labelling our entries and exits we are asking the RL to fit a model/rule set that matches our entries and exits labels, we then test it on our testing dataset counting on the fact that these same rules/model can be applied in a trading fashion. We don't need a RL to find tags for us, any human being can pull up a chart of Apple and find the best entries and exits, we don't need the RL to find these for us, we need it to come with signals that can identify these entries and exits so that they may be used in a trading fashion.

AminHP commented 4 years ago

Well, as far as I know, when you have some knowledge about solving a problem, you can combine transfer learning with RL or use Fuzzy-RL in order to inject that knowledge into the agent.

In transfer learning, you have a model that your agent uses it as a function approximator for Q-values. By pre-training this model with the tagged dataset, you can speed up the agent's training. Maybe this is something you are looking for (but this is tricky and I'm not sure how exactly to do that).

In Fuzzy-RL, you can inject your knowledge in terms of some rules. There is no need to have a tagged dataset. This is mostly used in continuous state/action spaces. If you know someone that knows how to trade, this is a good idea to inject his/her knowledge into the agent.

sword134 commented 4 years ago

I think fuzzy-RL would be off the table for 90% of traders since the reason they come to RL is to find strategies and not optimize existing profitable ones.

I think what you describe as transfer learning is what I am talking about, I'm happy you brought it up because I didn't know it existed and will take a deeper look into it. I still believe that if one applied transfer learning in trading the results would be better and come faster.

AminHP commented 4 years ago

Yes, generally transfer learning makes both supervised and unsupervised learning faster. But, in RL it is way more complicated especially when you are using the new algorithms like PPO.

By the way, I like your idea. Let me know if you find a way to apply transfer learning to this problem :)