Closed shokimble closed 7 years ago
@shokimble, Just use tf.squeeze() method to get rid of this redundant dimension (tensor), or use [0,:,:] indexing for numpy array, but..
... a3c right from the start... that's quite complex algorithm with a lot of moving parts. I don't think it will work with btgym right out of the box, at least:
you should set different environment communication port for every worker, it will mess up otherwise, -- as a3c launches separate environment [and server] instance for every one.
There is essential to featurise and normalize environment state representation in some way, original price signal is unbounded and non-stationary.
A3C is [comparatively] fast but less sample-efficient than DQN family, need to keep it in mind.
I plan to add support for async training, but it will take time.
For now I can give you an implementation of DDQN for btgym, it's working but unfinished, that's why I didn't published it in `examples' yet. It has suboptimal estimator architecture (good for Atari).
At least it runs, showcases state and reward preparation and can converge to about zero episode drawback (no wins) in around 600-800k steps.
Download it here: https://yadi.sk/d/9140mMaC3Kx8aL
Like I said still learning 😄
Squeeze looks like a no go. Not working still.
I have point 1 sorted, new port for each instance. It might be 2 that is the problem.
I'll try your example.
I wouldn't expect any RL to be able to beat the market (especially in FX). It really wasn't my starting point. I'm really interested in applications around the periphery like position sizing and risk etc.
Also very interested when it comes to games but that's a different but related ballgame. You might have noticed my posts on trying to get StreetFighter 2 working. I have spent too much of my life trying to master that game and the reason why it's so hard is it was originally built with a simple feedback AI so to be able to beat it is also the kind of accomplishment I am after.
@shokimble, Since single space state is (4,30), batch input tensor should be: ( None, 4, 30).
Change here:
#-- main
env_test = Environment(render=True, instancenumber=0, eps_start=0., eps_end=0.)
print(env_test.env.observation_space.shape)
(NUM_STATE_0, NUM_STATE_1) = env_test.env.observation_space.shape
in class Brain:
..................... def _build_model(self): .................... l_input = Input( batch_shape=(None, NUM_STATE_0, NUM_STATE_1) )
Hi,
Finally got around to trying a real "deep learning" implementation against btgym and I've run up against a problem. I really don't know enough about openai gym to understand what the problem is fully but it seems to me based on the implementation I am trying and observing other implementations that observation_space.shape returns inconsistent values.
I tried modifying this basic A3C implementation: https://github.com/jaara/AI-blog/blob/master/CartPole-A3C.py
I've attached my attempt below but when I run it I get the error:
ValueError: Error when checking : expected input_1 to have 2 dimensions, but got array with shape (1, 4, 30)
When I print observation_space.shape it seems to change all the time which is expected but I believe it should always follow the same format (again I'm a bit of a newbie with this stuff so I could be wrong).
Are you able to take a look? I'll keep digging and try another implementation - I'm thinking a DQN implementation. The problem with most of the implementations for DQN or A3C is that they rely on the state being implemented as a screenshot so an array of rows and columns sometimes represented as RGB or often just 0 or 1.
Keep up the good work BTW.