Draichi / T-1000

:zap: :zap: π˜‹π˜¦π˜¦π˜± π˜™π˜“ 𝘈𝘭𝘨𝘰𝘡𝘳𝘒π˜₯π˜ͺ𝘯𝘨 𝘸π˜ͺ𝘡𝘩 π˜™π˜’π˜Ί π˜ˆπ˜—π˜
https://ray.readthedocs.io/en/latest/index.html
MIT License
174 stars 43 forks source link

work in progress? #7

Closed ghost closed 5 years ago

ghost commented 5 years ago

thank you for sharing your project. I've been testing several projects that use PPO as well as doing mine and so far could not get results, however when training yours I see steadily increasing rewards and the code makes sense. So I'm ready to contribute to it because as far as I see PPO algorithm tend not to overfit as others. When I try to evaluate the checkpoints it generates I see gym's "not implemented" error. Is this because there is Work in Progress you are doing or am I doing something wrong?

(gym_trading) canermac-3:cryptocurrency_prediction apple$ rllib rollout /Users/apple/ray_results/default/PPO_Trading-v0_0_2019-04-16_12-38-51strcz86z/checkpoint_40/checkpoint-40 --run PPO --env Trading-v0 --steps 1000

lz4 not available, disabling sample compression. This will significantly impact RLlib performance. To install lz4, run `pip install lz4`.
2019-04-16 19:28:22,976 WARNING worker.py:1406 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
2019-04-16 19:28:22,980 INFO node.py:423 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-04-16_19-28-22_24310/logs.
2019-04-16 19:28:23,106 INFO services.py:363 -- Waiting for redis server at 127.0.0.1:22839 to respond...
2019-04-16 19:28:23,251 INFO services.py:363 -- Waiting for redis server at 127.0.0.1:11763 to respond...
2019-04-16 19:28:23,256 INFO services.py:760 -- Starting Redis shard with 0.86 GB max memory.
2019-04-16 19:28:23,305 INFO services.py:1384 -- Starting the Plasma object store with 1.29 GB memory using /tmp.
2019-04-16 19:28:24,135 WARNING ppo.py:172 -- FYI: By default, the value function will not share layers with the policy model ('vf_share_layers': False).
2019-04-16 19:28:24,375 INFO policy_evaluator.py:278 -- Creating policy evaluation worker 0 on CPU (please ignore any CUDA init errors)
2019-04-16 19:28:24.377440: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
WARNING:tensorflow:From /Users/apple/miniconda3/envs/gym_trading/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /Users/apple/miniconda3/envs/gym_trading/lib/python3.6/site-packages/ray/rllib/models/action_dist.py:114: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.random.categorical instead.
WARNING:tensorflow:From /Users/apple/miniconda3/envs/gym_trading/lib/python3.6/site-packages/tensorflow/python/ops/array_grad.py:425: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
/Users/apple/miniconda3/envs/gym_trading/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:110: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
WARNING:tensorflow:From /Users/apple/miniconda3/envs/gym_trading/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py:102: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
2019-04-16 19:28:31,488 INFO multi_gpu_optimizer.py:74 -- LocalMultiGPUOptimizer devices ['/cpu:0']
(pid=24330) 2019-04-16 19:28:44,682 WARNING compression.py:20 -- lz4 not available, disabling sample compression. This will significantly impact RLlib performance. To install lz4, run `pip install lz4`.
(pid=24333) 2019-04-16 19:28:44,682 WARNING compression.py:20 -- lz4 not available, disabling sample compression. This will significantly impact RLlib performance. To install lz4, run `pip install lz4`.
(pid=24330) 2019-04-16 19:28:45,941 INFO policy_evaluator.py:278 -- Creating policy evaluation worker 2 on CPU (please ignore any CUDA init errors)
(pid=24330) 2019-04-16 19:28:45.942628: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
(pid=24333) 2019-04-16 19:28:45,928 INFO policy_evaluator.py:278 -- Creating policy evaluation worker 1 on CPU (please ignore any CUDA init errors)
(pid=24333) 2019-04-16 19:28:45.931715: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
(pid=24330) WARNING:tensorflow:From /Users/apple/miniconda3/envs/gym_trading/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
(pid=24330) Instructions for updating:
(pid=24330) Colocations handled automatically by placer.
(pid=24333) WARNING:tensorflow:From /Users/apple/miniconda3/envs/gym_trading/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
(pid=24333) Instructions for updating:
(pid=24333) Colocations handled automatically by placer.
(pid=24330) WARNING:tensorflow:From /Users/apple/miniconda3/envs/gym_trading/lib/python3.6/site-packages/ray/rllib/models/action_dist.py:114: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
(pid=24330) Instructions for updating:
(pid=24330) Use tf.random.categorical instead.
(pid=24333) WARNING:tensorflow:From /Users/apple/miniconda3/envs/gym_trading/lib/python3.6/site-packages/ray/rllib/models/action_dist.py:114: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
(pid=24333) Instructions for updating:
(pid=24333) Use tf.random.categorical instead.
(pid=24330) WARNING:tensorflow:From /Users/apple/miniconda3/envs/gym_trading/lib/python3.6/site-packages/tensorflow/python/ops/array_grad.py:425: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
(pid=24330) Instructions for updating:
(pid=24330) Use tf.cast instead.
(pid=24330) /Users/apple/miniconda3/envs/gym_trading/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:110: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
(pid=24330)   "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
(pid=24333) WARNING:tensorflow:From /Users/apple/miniconda3/envs/gym_trading/lib/python3.6/site-packages/tensorflow/python/ops/array_grad.py:425: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
(pid=24333) Instructions for updating:
(pid=24333) Use tf.cast instead.
(pid=24333) /Users/apple/miniconda3/envs/gym_trading/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:110: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
(pid=24333)   "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
(pid=24330) WARNING:tensorflow:From /Users/apple/miniconda3/envs/gym_trading/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py:102: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
(pid=24330) Instructions for updating:
(pid=24330) Deprecated in favor of operator or tf.math.divide.
(pid=24333) WARNING:tensorflow:From /Users/apple/miniconda3/envs/gym_trading/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py:102: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
(pid=24333) Instructions for updating:
(pid=24333) Deprecated in favor of operator or tf.math.divide.
Traceback (most recent call last):
  File "/Users/apple/miniconda3/envs/gym_trading/bin/rllib", line 10, in <module>
    sys.exit(cli())
  File "/Users/apple/miniconda3/envs/gym_trading/lib/python3.6/site-packages/ray/rllib/scripts.py", line 40, in cli
    rollout.run(options, rollout_parser)
  File "/Users/apple/miniconda3/envs/gym_trading/lib/python3.6/site-packages/ray/rllib/rollout.py", line 102, in run
    rollout(agent, args.env, num_steps, args.out, args.no_render)
  File "/Users/apple/miniconda3/envs/gym_trading/lib/python3.6/site-packages/ray/rllib/rollout.py", line 169, in rollout
    env.render()
  File "/Users/apple/miniconda3/envs/gym_trading/lib/python3.6/site-packages/gym/core.py", line 108, in render
    raise NotImplementedError
NotImplementedError
Draichi commented 5 years ago

Hey mate, yeah there are work in progress, especially in the back-testing. I made a commit 2fed29c with a simple implementation of how it supposedly works, but there is still some work to do, if you update and run again you'll see something like this:

ezgif-4-5ba30ced0c50

You can see that it is running trough the data loaded and you can see it is taking actions, green '^' is a buy, red 'v' is a sell. What is happening here in this example is that the agent learned to sell in every position, this can be happening because of not enough training or a not clear environment. Unfortunately I think is the latter.

But maybe you can help with something I'm not seeing.

Also, the data that was loaded is outdated (cryptocurrency_prediction/datasets/LTC_1d_2018-11-01_2019-03-18.csv). The data that I'm implementing is now coming from cryptocompare_api.py or from the app app/src/store/mutations.js

I hope this can clarify a lil bit to you, if not feel free to keep chatting. Try to implement in your way and keep following because I'm updating almost everyday. Thank you

ghost commented 5 years ago

ok I understand, thank you. I've been training the btc-eur data that cryptocompare_api.py generates all day today. Been getting mixed results as below so I will try with different datasets and play with the features to see what happens... Screenshot at Apr 17 22-31-03

Draichi commented 5 years ago

ok I understand, thank you. I've been training the btc-eur data that cryptocompare_api.py generates all day today. Been getting mixed results as below so I will try with different datasets and play with the features to see what happens... Screenshot at Apr 17 22-31-03

After doing some tests, I believe you are using a learning rate too big, so the agent starts to overshoot the gradient ascent steps. You can use the grid_search in train_trading_bot.py to test with different learning rates.

Here is some tests with the respective learning rates I used: tensorboard

ghost commented 5 years ago

Hey, I tested the code with three new datasets. Generated data from config/functions.py as the code expects. Gridsearch on learning rate was there for all of them.

This is what I got from DASHBTC Screenshot at Apr 28 15-39-16 This is what I got from DLTBTC Screenshot at Apr 28 15-38-29 This is what I got from BNBBTC Screenshot at Apr 28 22-46-48

These are the evaluations respectively. Screenshot at Apr 28 19-51-37 Screenshot at Apr 28 19-49-48 Screenshot at Apr 28 23-32-59

Evaluations tend to differ according to datasets but in general they don't act as they should. I actually got similar results from my other experiments with PPO.. maybe I didn't train it enough(4-5 hours with 2 cpu) but from the screenshot of yours I see 900K..I did more than 1M for all. So I want to ask how and from which datasets did you get positive results?

By the way, I am actually working (day and night for several months) on creating a profitable rl trading algorithm and have been testing different projects, algorithms and trading environments as well as doing my own. I could get some results from my own experiments using DQN with LSTM and DDPG with different hyper parameters but so far I could not trust any of them for live trading. However, I clearly understood that the key is in hyper parameter search.

I used tensorflow/keras on stable baselines and coach in general and was I thinking of creating something that runs all those parameters on a framework...so I just stumbled upon Ray Tune's Gridsearch in your project.

So what I'm thinking is to create something that runs through all policy/value based algorithms, related neural network architectures, config hyperparameters, features and timeframes to find out which combinations give trustable results for most of the pairs.

I also have simple environment like yours and a features file like you do. Today created a simple training script to develop on starting from tomorrow. You can check here: https://gist.github.com/canercak/b84cf452385ff7f5e28d2900b62e7204

I'm writing this because I see that we are trying to do similar things( you are using 70 features, gridsearch and policy gradients). If you are interested I believe we can discuss/brainstorm to create something that works. I can say that I need help in NN architectures. You can dm me on telegram @canercak anytime.

Draichi commented 5 years ago

Hello bro, nice to see your good insights

First of all, responding to you question, this screenshot I sent earlier is from ETHBTC, daily data, 181 datapoints. Despite the good results on trainning, when evaluating they don't perform well. I'm still trying to get a good model to use as a benchmark.

I'm training/testing right now DASHBTC but seems to not learn anything at all (maybe there are some dataframes that don't have any pattern to learn?). I also made a couple of changes on the environment line 85 (the way the reward is computed).

A little tip here for you: if you've changed the environment, be sure to update the name that is registered, otherwise you'll continue to use an old environment. I was having some problems until I figured this out.

About the Ray Tune script that tests different hyperparameters, I'm totally optimistic about it, I think it's a great idea. Especially with the hyperparams features in get datasets.

I want to give more options to the agent, I think this will help to make a profit. My idea is to have a portfolio, so instead of a single currency and its respective features in the csv file, I want to have many other currencies (negatively correlated preferentially) and their respective features. Thus, in the environment, the agent would have the action of buying, selling or maintaining any of these coins. I'm also thinking of changing the project a bit to use forex data instead of just cryptocurrencies because of the volume of forex. Cryptos have a relative low volume, which makes them very volatile / unpredictable.

I'm still working to make this bad boy profitable and I appreciate your help, thanks. I'll send you a msg so you can have my contact on telegram.

ghost commented 5 years ago

ok. I implemented what I was talking about very basicly and stumbled upon RAINBOW. I was using the pong-rainbow tuned parameters I got from ray github repo as below..did not use the girdsearch:

    elif algo=="RAINBOW":
        params = {
                    "run": "DQN",
                    "env": ENV_NAME,   
                    "stop": {
                        "timesteps_total": TRAIN_TIMESTEPS, 
                    },
                    "checkpoint_freq": 100,
                    "checkpoint_at_end": True,
                    "config": {
                        "num_atoms": 51,
                        "noisy": True,
                        "gamma": 0.99,
                        "lr": 0.0001,
                        "hiddens": [
                            512
                        ],
                        "learning_starts": 10,
                        "buffer_size": 50000,
                        "sample_batch_size": 4,
                        "train_batch_size": 32,
                        "schedule_max_timesteps": 2000000,
                        "exploration_final_eps": 0,
                        "exploration_fraction": 0.000001,
                        "target_network_update_freq": 500,
                        "prioritized_replay": True,
                        "prioritized_replay_alpha": 0.5,
                        "beta_annealing_fraction": 0.2,
                        "final_prioritized_replay_beta": 1,
                        "num_workers": NUM_WORKERS,  
                        "n_step": 3,
                        "model": {
                            "grayscale": True,
                            "zero_mean": False,
                            "dim": 42
                        }
                    } 
                }

When I was training I could get perfect rewards as below for both DLT from 2018 to 2019 and XMR training from 2018 to 2019-03 Screenshot at Apr 30 14-00-21

When I evaluated DLT from the start of 2019-1 to now I got a 3 fold return on DLTBTC (that I still cannot believe). When I evaluated XMR from 2019-3 to now, agent did nothing..no buy or sell signals. So I had to double-check DLT if rainbow parameters will work in your environment..Got exactly the opposite results like below.

Screenshot at Apr 30 18-36-26

However, I realized that the training data was from 2019-2 to 3 ... just 1 month..I ran by 2000 and it used to collect more but did not this time..

train_single_pair.py --symbol DLT --to_symbol BTC --histo hour --limit 2000 --algo DQN

Then I've placed my own data to generate the dataset that starts from 2018 but that resulted many running errors I could not understand.. so anyway, my environment did not work at all on PPO but it worked on rainbow for DLT with that config. Thats what I got so far..I will work on implementing shorter timeframes like ema5 and 15 minutes...if you ever try rainbow with longer timeframe in your environment I'm looking forward for your reply on how it works...we can also discuss about technical indicators to use...I know all and do have experience using them in old fashioned trading but not in rl...

Draichi commented 5 years ago

Hey brother,

I updated the project. New environment, new render method and new reward function.

After some tests i saw that the bot can beat the market by a small margin in 80% of the time. The other 20% of the time it takes bad decisions or try to buy a amount larger than he can, so the balance became negative and the episode ends (this is more a environment problem not a agent's problem itself)

There is still a lot to improve,

Thanks for our support :+1: