Closed parrondo closed 6 years ago
@parrondo,
Do you have some code to test predictions with their model?
yes, it is in 'beta' and at the time present it looks like this: you split entire dataset into source (~train) and target (~test) domains by making specifications do corresponding data provider class, see diagram here: https://kismuz.github.io/btgym/intro.html#data-flow-structure and one class example here: https://kismuz.github.io/btgym/btgym.datafeed.html#btgym.datafeed.derivative.BTgymRandomDataDomain
AAC framework class allows you specify train - test cycle via episode_train_test_cycle
arg:
episode_train_test_cycle – tuple or list as (train_number, test_number), def=(1,0): enables infinite loop such as: run train_number of train data episodes, than test_number of test data episodes, repeat. Should be consistent with provided dataset parameters (test data should exist if test_number > 0)
see: https://kismuz.github.io/btgym/btgym.algorithms.html#module-btgym.algorithms.aac
Do you have some code to extract weights from the checkpoint file and deploy the model? If not, could you provide with some guideline instructions to do that?
See #40; btgym.algorithm.worker class provides checkpoint loading functionality via Tensorflow standart methods; you can exploit trained model as you wish by modifying .process() method of btgym.algorithms.aac.BaseAAC or one of its subclasses;
How do you plan to implement the production test?
I'm currently trying to achieve at least 'promising' degree of generalisation on test data via implementing meta-learning algorithms based on AAC framework, so I'd go for production tests afterwards;
PS: data iterators classes can seem a bit over-complicated at first time, but actually it is an attempt to properly formulate and implement meta-learning objective. See https://github.com/Kismuz/btgym/blob/master/docs/papers/btgym_formalism_draft.pdf for formal definitions and example notebook: https://github.com/Kismuz/btgym/blob/master/examples/data_domain_api_intro.ipynb
Kismuz, Thank you very much for your reply. I am still testing.
Hi Kismuz,
I have the same question, under my experience making stochastic forecasting with lstms does not get good results since it does not predict more than a line. It's getting good testing results on prediction? I am checking if I can implement any autoregressive model or any other kind of regressor inside the RL algorithm and how.
By the way, awesome job getting better by weeks. Congrats.
@gaceladri,
It's getting good testing results on prediction?
In short: no.
Strictly speaking, it is correct to to talk about policy generalisation, not about any prediction, cause here we map states directly to actions without any explicit models or predictions about future states. But in a nutshell, policies learnt by 'generic' Q-value iteration or policy gradient methods do not generalise well even with minor task shift. That's why community is raving about meta-learning.
@gaceladri,
autoregressive model or any other kind of regressor inside the RL algorithm
I can't grasp the idea, can you explain in more details what you mean?
Strictly speaking, it is correct to to talk about policy generalisation, not about any prediction
OK, you are right about that. Nevertheless we are trying to get the optimal policy function which reach the maximum reward in a very simple trading environment. That is, the available actions are only BUY and SELL, with their closing counterpart, and DO NOTHING. So "non strictly speaking" we could say we have a "model" (some kind of MDP) in the "trading domain" to predict when the price will go up (BUY, close short), and go down (SELL, close long). Saying that, I am agree to use the more correct expression "policy" and not "model" when we use "reinforcement learning domain".
That's why community is raving about meta-learning
As you have been developed your project, finally you have implemented some kind of meta-learning methods. I see two packages: "metalearn_2" and "mldg" under research folder. So, I figure out that "mldg" is the method from Li "Learning to Generalize: Meta-Learning for Domain Generalization". So, if it is operative, could you provide with any sample of trainer_config() and policy_config() and whatever other config needed for the meta-trainer? Jupyter notebook welcome.
May be you could take a look to this paper: Deep Meta-Learning: Learning to Learn in the Concept Space https://arxiv.org/pdf/1802.03596.pdf
It is hard to implement but offer the interesting advantage of extract automatic concept-level features.
Yup, exactly. I'm trying to implement it and have accidentally pushed MLDG branch to Github :)
I currently have several versions but none of it have shown any noticeable generalisation results. If you interested I can push it along with training notebooks but be warned is unfinished work with a lot of unbrushed code.
Thank you Kismuz. If you are so kind I would like to test your training notebooks even with that warning. I am trying to be familiar with your logic for new implementations.
@parrondo I have pushed separate temp. branch containing MLDG code and notebooks: https://github.com/Kismuz/btgym/tree/develop_meta_learning_gradient
Some notes:
there is no inner optimiser learnable update rate implemented yet, so need to play with fast_opt_learn_rate
param in ~[0.1, .0001];
uses guided policy search loss which speeds up training esp. on initial stages; annealed to zero in 10M steps; can be disabled by setting guided_lambda=0
in trainer_config dict;
one need either a lot of patience or a lot of cores as training is almost 2 times slower;
There are two-and-a-halve versions of algorithm differing in way MDP tasks are defined:
aac.MLDG
class and notebook a_MLDG
;train_support
, num_train_updates
; This i is aac_2.MLDG_d
class and a_MLDGd
;aac_1.AMLDG_1
and a_MLDG_1
nb.; it can be thought as splitting episode trajectory into smaller partial trajectories (rollouts) and conditioning every sub-trajectory on previous one. TODO: can make distribution like above, it would be local replay buffer which is reseted at the beginning of every episode. So it is kind of closed loop optimisation within single episode.Thank you. Testing!
@parrondo et al.:
Some additions to MLDG branch pushed:
guided_lambda
should be raised to ~5.0 to keep up sufficient exploration; all MLDG variants affected;OK Kismuz. Thank you. I am testing that.
Your whole framework is awesome. May be now it is time to get it work for trading. I should start to reproduce some bibliographic result. This is very interesting one: https://arxiv.org/abs/1706.10059 And it is interesting because has established the neural net, the instruments and full conditions , avoiding the hurt work to look for them. Obviously each trader must research for their very best strategy parameters, but it is a minimum to reproduce some published results in order to be sure that we are not off the road. The main obstacle is to get multi-instrument data-feed which is not implemented yet on BTGym.
So, 1) What do you think about reproducing that results? 2) If not, are you trying to reproduce any results before you reach your owns?
@Kismuz I am checking this one: https://arxiv.org/abs/1707.03141 its v3 version."A simple Neural attentive meta-learner" the last version was "meta-learning with temporal convolutions" that its the meta-learning version of the WaveNet model. Here you can check an implementation that not achieve 99% but achieves 95% on 5-shot. https://github.com/devsisters/TCML-tensorflow .
Edit: https://arxiv.org/abs/1804.03782 look this paper. I am studying the complexity of implement your btgymserver with twisted and make the environment decentralised. What do you think Kismuz? Imagine a pool of decentralised agents ☺
@Kismuz This is what I wanted to say: https://arxiv.org/abs/1611.01779 I don't know if you are currently doing this in order to take the actions. For this reason I said that a lstm is not the solution because we have an stochastic prediction and the lstm's does not work well under this conditions. Maybe temporal convolutions could help, but I will definitively check the results of the algorithm when it make predictions into the future not just the current state.
@gaceladri,
"A simple Neural attentive meta-learner"
Yes, I'm working on it; at least on temporal convolutions encoder; not got down to attention yet; naive WaveNet implementation is pain-slow so Fast WaveNet algorithm should be ok;
...make the environment decentralised.
Think it could be done
Good paper indeed; note that setup described naturally fits BTgym architecture: observation['external']
can be thought as sensory input
and observation['internal']
as measurements
, using authors notation;
Problem is that our task boils down exactly to 'predicting sensory input' and '....Prediction of full sensory input in realistic three-dimensional environments remains an open
challenge...' as authors agree; still very interesting approach worth trying to implement;
@parrondo,
What do you think about reproducing that results?
this work received a lot of attention and I think some people already do this job, but there are some ideas from the paper I plan to implement, particularly EIIE, time-decaying data-sampling and rolling test period;
If not, are you trying to reproduce any results before you reach your owns?
All this work is about reproducing others results actually :)
OK Kismuz, I mean trading paper results. Which one are you trying to reproduce?
Hi Kizmuz, First of all congratulations for this really great job!
OK, I have trained my workers, and here you are some questions:
Do you have some code to test predictions with their model? Do you have some code to extract weights from the checkpoint file and deploy the model? If not, could you provide with some guideline instructions to do that? How do you plan to implement the production test?
Thank you in advance.