Making predictions - Githubissues

parrondo commented 6 years ago

Hi Kizmuz, First of all congratulations for this really great job!

OK, I have trained my workers, and here you are some questions:

Do you have some code to test predictions with their model? Do you have some code to extract weights from the checkpoint file and deploy the model? If not, could you provide with some guideline instructions to do that? How do you plan to implement the production test?

Thank you in advance.

Kismuz commented 6 years ago

@parrondo,

Do you have some code to test predictions with their model?

yes, it is in 'beta' and at the time present it looks like this: you split entire dataset into source (~train) and target (~test) domains by making specifications do corresponding data provider class, see diagram here: https://kismuz.github.io/btgym/intro.html#data-flow-structure and one class example here: https://kismuz.github.io/btgym/btgym.datafeed.html#btgym.datafeed.derivative.BTgymRandomDataDomain

AAC framework class allows you specify train - test cycle via episode_train_test_cycle arg:

episode_train_test_cycle – tuple or list as (train_number, test_number), def=(1,0): enables infinite loop such as: run train_number of train data episodes, than test_number of test data episodes, repeat. Should be consistent with provided dataset parameters (test data should exist if test_number > 0)

see: https://kismuz.github.io/btgym/btgym.algorithms.html#module-btgym.algorithms.aac

internal logic is to freeze learning when making test runs and do not update replay memory (if present) to avoid 'future information' leakage;

Do you have some code to extract weights from the checkpoint file and deploy the model? If not, could you provide with some guideline instructions to do that?

See #40; btgym.algorithm.worker class provides checkpoint loading functionality via Tensorflow standart methods; you can exploit trained model as you wish by modifying .process() method of btgym.algorithms.aac.BaseAAC or one of its subclasses;

How do you plan to implement the production test?

I'm currently trying to achieve at least 'promising' degree of generalisation on test data via implementing meta-learning algorithms based on AAC framework, so I'd go for production tests afterwards;

PS: data iterators classes can seem a bit over-complicated at first time, but actually it is an attempt to properly formulate and implement meta-learning objective. See https://github.com/Kismuz/btgym/blob/master/docs/papers/btgym_formalism_draft.pdf for formal definitions and example notebook: https://github.com/Kismuz/btgym/blob/master/examples/data_domain_api_intro.ipynb

parrondo commented 6 years ago

Kismuz, Thank you very much for your reply. I am still testing.

gaceladri commented 6 years ago

Hi Kismuz,

I have the same question, under my experience making stochastic forecasting with lstms does not get good results since it does not predict more than a line. It's getting good testing results on prediction? I am checking if I can implement any autoregressive model or any other kind of regressor inside the RL algorithm and how.

By the way, awesome job getting better by weeks. Congrats.

Kismuz commented 6 years ago

@gaceladri,

It's getting good testing results on prediction?

In short: no.

Strictly speaking, it is correct to to talk about policy generalisation, not about any prediction, cause here we map states directly to actions without any explicit models or predictions about future states. But in a nutshell, policies learnt by 'generic' Q-value iteration or policy gradient methods do not generalise well even with minor task shift. That's why community is raving about meta-learning.

Kismuz commented 6 years ago

@gaceladri,

autoregressive model or any other kind of regressor inside the RL algorithm

I can't grasp the idea, can you explain in more details what you mean?

parrondo commented 6 years ago

Strictly speaking, it is correct to to talk about policy generalisation, not about any prediction

OK, you are right about that. Nevertheless we are trying to get the optimal policy function which reach the maximum reward in a very simple trading environment. That is, the available actions are only BUY and SELL, with their closing counterpart, and DO NOTHING. So "non strictly speaking" we could say we have a "model" (some kind of MDP) in the "trading domain" to predict when the price will go up (BUY, close short), and go down (SELL, close long). Saying that, I am agree to use the more correct expression "policy" and not "model" when we use "reinforcement learning domain".

That's why community is raving about meta-learning

As you have been developed your project, finally you have implemented some kind of meta-learning methods. I see two packages: "metalearn_2" and "mldg" under research folder. So, I figure out that "mldg" is the method from Li "Learning to Generalize: Meta-Learning for Domain Generalization". So, if it is operative, could you provide with any sample of trainer_config() and policy_config() and whatever other config needed for the meta-trainer? Jupyter notebook welcome.

May be you could take a look to this paper: Deep Meta-Learning: Learning to Learn in the Concept Space https://arxiv.org/pdf/1802.03596.pdf

It is hard to implement but offer the interesting advantage of extract automatic concept-level features.

Kismuz commented 6 years ago

Yup, exactly. I'm trying to implement it and have accidentally pushed MLDG branch to Github :)

I currently have several versions but none of it have shown any noticeable generalisation results. If you interested I can push it along with training notebooks but be warned is unfinished work with a lot of unbrushed code.

parrondo commented 6 years ago

Thank you Kismuz. If you are so kind I would like to test your training notebooks even with that warning. I am trying to be familiar with your logic for new implementations.

Kismuz commented 6 years ago

@parrondo I have pushed separate temp. branch containing MLDG code and notebooks: https://github.com/Kismuz/btgym/tree/develop_meta_learning_gradient

Some notes:

there is no inner optimiser learnable update rate implemented yet, so need to play with fast_opt_learn_rate param in ~[0.1, .0001];
uses guided policy search loss which speeds up training esp. on initial stages; annealed to zero in 10M steps; can be disabled by setting guided_lambda=0 in trainer_config dict;
one need either a lot of patience or a lot of cores as training is almost 2 times slower;

There are two-and-a-halve versions of algorithm differing in way MDP tasks are defined:

first one is close to original paper: we define train task T[i] as task preceding to meta-test one T[i+1], programmatically it means we run to separate instances of environment for each worker and coordinate sampling parameters to comply to above definition. So, in cycle we get partial train trajectory (~rollout), update meta_test policy, get meta-test rollout and perform joint meta-opt step. This is aac.MLDG class and notebook a_MLDG;
first+: trying to make more iid-like distribution by first getting entire train episode trajectory (or more if opted) than randomly sampling from it to get mini-batch to adapt every meta-test rollout. Can do several updates for each step then. There are two args specific: train_support, num_train_updates; This i is aac_2.MLDG_d class and a_MLDGd;
second: we can define T[i] as 'almost T[i+1]' but one rollout behind. From this point of view we can locally 'fast_optimize' every next rollout by using previous one. We need one stream of data at this case and this is aac_1.AMLDG_1 and a_MLDG_1 nb.; it can be thought as splitting episode trajectory into smaller partial trajectories (rollouts) and conditioning every sub-trajectory on previous one. TODO: can make distribution like above, it would be local replay buffer which is reseted at the beginning of every episode. So it is kind of closed loop optimisation within single episode.

parrondo commented 6 years ago

Thank you. Testing!

Kismuz commented 6 years ago

@parrondo et al.:

Some additions to MLDG branch pushed:

implemented one-rollout-ahead MLDG with local episodic replay buffer; notebook added;
removed guided loss from meta-train step due to source/target execution consistency; guided loss remained as part of meta-test loss so guided_lambda should be raised to ~5.0 to keep up sufficient exploration; all MLDG variants affected;
it should be noted that very slight test performance improvement can be observed on real data (one year dataset); still below acceptable level;
there is noticeable improvement in convergence speed and stability for synthetic sine and increasing_sine data;
after all variants tried it seems reasonable to think model lacks 'conceptual' representation power; now I'm gonna revert back to state encoding part and implement temporal convolutions encoder based on WaveNet model;

parrondo commented 6 years ago

OK Kismuz. Thank you. I am testing that.

Your whole framework is awesome. May be now it is time to get it work for trading. I should start to reproduce some bibliographic result. This is very interesting one: https://arxiv.org/abs/1706.10059 And it is interesting because has established the neural net, the instruments and full conditions , avoiding the hurt work to look for them. Obviously each trader must research for their very best strategy parameters, but it is a minimum to reproduce some published results in order to be sure that we are not off the road. The main obstacle is to get multi-instrument data-feed which is not implemented yet on BTGym.

So, 1) What do you think about reproducing that results? 2) If not, are you trying to reproduce any results before you reach your owns?

gaceladri commented 6 years ago

@Kismuz I am checking this one: https://arxiv.org/abs/1707.03141 its v3 version."A simple Neural attentive meta-learner" the last version was "meta-learning with temporal convolutions" that its the meta-learning version of the WaveNet model. Here you can check an implementation that not achieve 99% but achieves 95% on 5-shot. https://github.com/devsisters/TCML-tensorflow .

Edit: https://arxiv.org/abs/1804.03782 look this paper. I am studying the complexity of implement your btgymserver with twisted and make the environment decentralised. What do you think Kismuz? Imagine a pool of decentralised agents ☺

gaceladri commented 6 years ago

@Kismuz This is what I wanted to say: https://arxiv.org/abs/1611.01779 I don't know if you are currently doing this in order to take the actions. For this reason I said that a lstm is not the solution because we have an stochastic prediction and the lstm's does not work well under this conditions. Maybe temporal convolutions could help, but I will definitively check the results of the algorithm when it make predictions into the future not just the current state.

Kismuz commented 6 years ago

@gaceladri,

"A simple Neural attentive meta-learner"

Yes, I'm working on it; at least on temporal convolutions encoder; not got down to attention yet; naive WaveNet implementation is pain-slow so Fast WaveNet algorithm should be ok;

...make the environment decentralised.

Think it could be done

https://arxiv.org/abs/1611.01779

Good paper indeed; note that setup described naturally fits BTgym architecture: observation['external'] can be thought as sensory input and observation['internal'] as measurements, using authors notation; Problem is that our task boils down exactly to 'predicting sensory input' and '....Prediction of full sensory input in realistic three-dimensional environments remains an open challenge...' as authors agree; still very interesting approach worth trying to implement;

Kismuz commented 6 years ago

@parrondo,

What do you think about reproducing that results?

this work received a lot of attention and I think some people already do this job, but there are some ideas from the paper I plan to implement, particularly EIIE, time-decaying data-sampling and rolling test period;

If not, are you trying to reproduce any results before you reach your owns?

All this work is about reproducing others results actually :)

parrondo commented 6 years ago

OK Kismuz, I mean trading paper results. Which one are you trying to reproduce?

Kismuz / btgym

Making predictions #46