Closed ghost closed 5 years ago
Hey mate, yeah there are work in progress, especially in the back-testing. I made a commit 2fed29c with a simple implementation of how it supposedly works, but there is still some work to do, if you update and run again you'll see something like this:
You can see that it is running trough the data loaded and you can see it is taking actions, green '^' is a buy, red 'v' is a sell. What is happening here in this example is that the agent learned to sell in every position, this can be happening because of not enough training or a not clear environment. Unfortunately I think is the latter.
But maybe you can help with something I'm not seeing.
Also, the data that was loaded is outdated (cryptocurrency_prediction/datasets/LTC_1d_2018-11-01_2019-03-18.csv). The data that I'm implementing is now coming from cryptocompare_api.py
or from the app app/src/store/mutations.js
I hope this can clarify a lil bit to you, if not feel free to keep chatting. Try to implement in your way and keep following because I'm updating almost everyday. Thank you
ok I understand, thank you. I've been training the btc-eur data that cryptocompare_api.py generates all day today. Been getting mixed results as below so I will try with different datasets and play with the features to see what happens...
ok I understand, thank you. I've been training the btc-eur data that cryptocompare_api.py generates all day today. Been getting mixed results as below so I will try with different datasets and play with the features to see what happens...
After doing some tests, I believe you are using a learning rate too big, so the agent starts to overshoot the gradient ascent steps. You can use the grid_search
in train_trading_bot.py
to test with different learning rates.
Here is some tests with the respective learning rates I used:
Hey, I tested the code with three new datasets. Generated data from config/functions.py as the code expects. Gridsearch on learning rate was there for all of them.
This is what I got from DASHBTC This is what I got from DLTBTC This is what I got from BNBBTC
These are the evaluations respectively.
Evaluations tend to differ according to datasets but in general they don't act as they should. I actually got similar results from my other experiments with PPO.. maybe I didn't train it enough(4-5 hours with 2 cpu) but from the screenshot of yours I see 900K..I did more than 1M for all. So I want to ask how and from which datasets did you get positive results?
By the way, I am actually working (day and night for several months) on creating a profitable rl trading algorithm and have been testing different projects, algorithms and trading environments as well as doing my own. I could get some results from my own experiments using DQN with LSTM and DDPG with different hyper parameters but so far I could not trust any of them for live trading. However, I clearly understood that the key is in hyper parameter search.
I used tensorflow/keras on stable baselines and coach in general and was I thinking of creating something that runs all those parameters on a framework...so I just stumbled upon Ray Tune's Gridsearch in your project.
So what I'm thinking is to create something that runs through all policy/value based algorithms, related neural network architectures, config hyperparameters, features and timeframes to find out which combinations give trustable results for most of the pairs.
I also have simple environment like yours and a features file like you do. Today created a simple training script to develop on starting from tomorrow. You can check here: https://gist.github.com/canercak/b84cf452385ff7f5e28d2900b62e7204
I'm writing this because I see that we are trying to do similar things( you are using 70 features, gridsearch and policy gradients). If you are interested I believe we can discuss/brainstorm to create something that works. I can say that I need help in NN architectures. You can dm me on telegram @canercak anytime.
Hello bro, nice to see your good insights
First of all, responding to you question, this screenshot I sent earlier is from ETHBTC, daily data, 181 datapoints. Despite the good results on trainning, when evaluating they don't perform well. I'm still trying to get a good model to use as a benchmark.
I'm training/testing right now DASHBTC but seems to not learn anything at all (maybe there are some dataframes that don't have any pattern to learn?). I also made a couple of changes on the environment line 85 (the way the reward is computed).
A little tip here for you: if you've changed the environment, be sure to update the name that is registered, otherwise you'll continue to use an old environment. I was having some problems until I figured this out.
About the Ray Tune script that tests different hyperparameters, I'm totally optimistic about it, I think it's a great idea. Especially with the hyperparams features in get datasets
.
I want to give more options to the agent, I think this will help to make a profit. My idea is to have a portfolio, so instead of a single currency and its respective features in the csv file, I want to have many other currencies (negatively correlated preferentially) and their respective features. Thus, in the environment, the agent would have the action of buying, selling or maintaining any of these coins. I'm also thinking of changing the project a bit to use forex data instead of just cryptocurrencies because of the volume of forex. Cryptos have a relative low volume, which makes them very volatile / unpredictable.
I'm still working to make this bad boy profitable and I appreciate your help, thanks. I'll send you a msg so you can have my contact on telegram.
ok. I implemented what I was talking about very basicly and stumbled upon RAINBOW. I was using the pong-rainbow tuned parameters I got from ray github repo as below..did not use the girdsearch:
elif algo=="RAINBOW":
params = {
"run": "DQN",
"env": ENV_NAME,
"stop": {
"timesteps_total": TRAIN_TIMESTEPS,
},
"checkpoint_freq": 100,
"checkpoint_at_end": True,
"config": {
"num_atoms": 51,
"noisy": True,
"gamma": 0.99,
"lr": 0.0001,
"hiddens": [
512
],
"learning_starts": 10,
"buffer_size": 50000,
"sample_batch_size": 4,
"train_batch_size": 32,
"schedule_max_timesteps": 2000000,
"exploration_final_eps": 0,
"exploration_fraction": 0.000001,
"target_network_update_freq": 500,
"prioritized_replay": True,
"prioritized_replay_alpha": 0.5,
"beta_annealing_fraction": 0.2,
"final_prioritized_replay_beta": 1,
"num_workers": NUM_WORKERS,
"n_step": 3,
"model": {
"grayscale": True,
"zero_mean": False,
"dim": 42
}
}
}
When I was training I could get perfect rewards as below for both DLT from 2018 to 2019 and XMR training from 2018 to 2019-03
When I evaluated DLT from the start of 2019-1 to now I got a 3 fold return on DLTBTC (that I still cannot believe). When I evaluated XMR from 2019-3 to now, agent did nothing..no buy or sell signals. So I had to double-check DLT if rainbow parameters will work in your environment..Got exactly the opposite results like below.
However, I realized that the training data was from 2019-2 to 3 ... just 1 month..I ran by 2000 and it used to collect more but did not this time..
train_single_pair.py --symbol DLT --to_symbol BTC --histo hour --limit 2000 --algo DQN
Then I've placed my own data to generate the dataset that starts from 2018 but that resulted many running errors I could not understand.. so anyway, my environment did not work at all on PPO but it worked on rainbow for DLT with that config. Thats what I got so far..I will work on implementing shorter timeframes like ema5 and 15 minutes...if you ever try rainbow with longer timeframe in your environment I'm looking forward for your reply on how it works...we can also discuss about technical indicators to use...I know all and do have experience using them in old fashioned trading but not in rl...
Hey brother,
I updated the project. New environment, new render method and new reward function.
After some tests i saw that the bot can beat the market by a small margin in 80% of the time. The other 20% of the time it takes bad decisions or try to buy a amount larger than he can, so the balance became negative and the episode ends (this is more a environment problem not a agent's problem itself)
There is still a lot to improve,
Thanks for our support :+1:
thank you for sharing your project. I've been testing several projects that use PPO as well as doing mine and so far could not get results, however when training yours I see steadily increasing rewards and the code makes sense. So I'm ready to contribute to it because as far as I see PPO algorithm tend not to overfit as others. When I try to evaluate the checkpoints it generates I see gym's "not implemented" error. Is this because there is Work in Progress you are doing or am I doing something wrong?