Kismuz / btgym

Scalable, event-driven, deep-learning-friendly backtesting library
https://kismuz.github.io/btgym/
GNU Lesser General Public License v3.0
984 stars 260 forks source link

Get better signal strengths by tuning scaling hyperparameters #35

Closed ryanle88 closed 5 years ago

ryanle88 commented 6 years ago

Hi @Kismuz I need your help to clarify a few questions:

  1. Currently, Unreal Stacked LSTM Strat 4_11 has better performance on actual FOREX prices than A3C Random, so just out of curiosity, I tried to run both of them with the sine wave data (test_sine_1min_period256_delta0002.csv). It turns out that A3C eventually converges while Unreal Stacked LSTM is nowhere near the convergence. Could you provide me some insights on that?

LSTM results: image image

A3C results: image image

  1. I had an overflow issue with the tanh(x_sma) while feeding my own data into the gym (SPY, Dow Jones, Russel 2000, and so on). After changing the T2 value from 2000 to 1, the issue went away but I am not sure if that was a proper fix. Could you help me shed some lights on that?

  2. I also got data consistency issue when passing equity data traded from 8:30am to 15:15pm instead of 24/7 like FOREX data that you used as examples. What should I do to fix this?

  3. I kept getting the "[WinError 2] The system cannot find the file specified" when trying to run the A3C example in Windows. image

  4. When I tried to run the A3C with the new version of tensorflow supporting GPU in Windows, I I got the error below also: image

image

Really appreciate your effort to build this awesome gym and great documentation.

Thank you in advance.

Kismuz commented 6 years ago

@ryanle88 ,

  1. If you mean running unreal_stacked_lstm_strat_4_11.ipynb on sine data test_sine_1min_period256_delta0002.csv - I can confirm it converges almost instantly (less than 20K iterartions). I advise to update package as some minor tweaks has been made recently. Have you done any alterations to code which can result divergence?

  2. Yes, it is exactly the way to adjust signal strength. To verify if it has proper amplification pay attention to tensorboard image of the state:

Pic

... and you can also change rendering options in notebook: set env_config['kwargs]['render_state_as_image'] to False to visually inspect curves: if they repeatedly look "squared" by tanh - lower T param value. It is better have signal under- than over-amplified. In this case most of the spikes fit in almost linear part of tanh while compressing outliers; ~[-0.9, +0.9] is fine.

  1. Quick fix is to rise value for time_gap param to cover empty time periods. You can run btgym.datafeed.test_data.py (alter dataset params to you needs) to see if it passes ok to save time.

4.. win again... sims it is os specific glob implementation (it used to compose system path string). Unfortunately can't replicate, do not have that os installed.

  1. Seems tf import error, can't comment on that. Does your tf installation works ok on other tasks? See also : #26
ryanle88 commented 6 years ago

@Kismuz Thank you for your quick response!

  1. The only code that I changed was the initial cash amount from 2,000 to 100,000. I got the latest of your code and reran the LSTM notebook for both of them. It converged for the 2,000 but not 100,000. Rather strange to me. Not sure if the signal strength had anything to do with that.

  2. I have a better understanding about this graph now since before I did not know how to read it. image

  3. I'll test it today with your suggestions and let you know if I can figure it out.

  4. It is my fault. I installed the nightly version, which came with the bug. The import statement works now with the release version 1.5 of tensorflow (GPU).

Kismuz commented 6 years ago

@ryanle88:

The only code that I changed was the initial cash amount from 2,000 to 100,000. I got the latest of your code and reran the LSTM notebook for both of them. It converged for the 2,000 but not 100,000. Rather strange to me. Not sure if the signal strength had anything to do with that.

well, because changes in initial cash amount require appropriate changes in stake size. Otherwise max profit /max loss margins (wich are set in precentage to initial cash) became bigger and within episode time span given it is impossible to make significant changes to broker account value --> no noticeable changes in broker statistic --> no proper reward estimation (cause reward terms got normalised wrt max possible profit/loss values) --> agent gets confused :)

I have a better understanding about this graph now

may be more convenient:

env_config = dict(
    class_ref=BTgymEnv, 
    kwargs=dict(
        .......
        render_state_as_image=False,
        .......
    )
)
ryanle88 commented 6 years ago

Thank you for the detailed explanation, @Kismuz !

    Quick fix is to rise value for time_gap param to cover empty time periods. You can run 
    btgym.datafeed.test_data.py (alter dataset params to you needs) to see if it passes ok to save time.

So I went back and try to change the time_gap as you recommended and it works. What I am still not very clear is that the maximum number of day difference in my dataset is 5 days but I have to set the time_gap at least 7 days and 1 hour to make it work (no data consistency error). Is it correct?

I also tried to play around with the btgym.datafeed.test_data.py as you suggested but couldn't make it work at all. Wondering if you could explain these terms (trial_params, episodes_params, target_period, sample_duration, and episode_duration) and how they are connected since I went over your code and read the comments you wrote about them but still did not click for me yet.

One thing I realized while modifying the time_gap is that its value could impact how the agent learned. When I set the time_gap >= 8 days, the agent learned pretty well compared to when it is set to 7 days. Not sure why there is a large difference like that.

Also should the final value of all the trained agents start at the same spot (initial cash)? Did I do anything wrong?

image

The next question I have is: what is the best way to tune T, initial cash and stake to get the best signal strength? Trial and error? What happens when I wanted to include other stocks and indices with large differences in values (AMD: 12.45, FB: 190, GOOG: 1111, ^GSPC: 2762). What should be the T value? Should I normalize the data before feed them into the gym?

To make it even more realistic, what if I wanted to add a couple of technical indicators (SMA, Stochastic, Bollinger Bands)? What should I do to find the optimal state representation with my available data set (SMA30, SMA60, SMA90, SMA180)? Should I run it through PCA or CNN first?

Really appreciate for taking your time to help me. If you need some help with minor tasks like update the documentation for btgym or the like, please let me know.

Thank you again.

Kismuz commented 6 years ago

@ryanle88 ,

what is the best way to tune T, initial cash and stake to get the best signal strength?

initial cash, stake and leverage are pure trading arithmetic; one should put agent in sensible conditions i.e. goal set should be reachable; as sanity check of your broker setup, randomly acting agent should be able to drain your trading account in at least ~2/3 of maximum episode duration; should be able to make at least one stake etc. See also: #36

Signal scaling and preprocessing: I tried not to use z-normalisation, pca, whitening etc due to reason all of those depend on your dataset distribution and work well when later is stationary, like as assumed in supervised/unsupervised learning. Here we get non-stationary distribution of states, so it is make sense to use methods which itself are not conditioned on data. To center signal, finite differences used, than scaled and bound by passing through non-linearity. Im' not saying it is best method. It worth checking how much dataset statistics drift with time. If drift is minimal, maybe it is ok to say stat is stationary and use PCA etc. Good direction for research btw. To illustrate effect of scaling parameter I prepared and pushed notebook, check it out: https://github.com/Kismuz/btgym/blob/master/examples/state_signal_scaling.ipynb

...will comment on other questions later.

tmorgan4 commented 6 years ago

@ryanle88

If you haven't already I would suggest looking at a book called "Statistically Sound Machine Learning for Algorithmic Trading of Financial Instruments: Developing Predictive-Model-Based Trading Systems Using TSSB".

Amazon link

Much of the book is written as a user manual for their (free) software called TSSB but if you're able to think outside the box there is an enormous amount of good information on statistics, normalization of data/indicators, feature selection, rules of thumb, etc. Obviously it requires a mathematical background but it's nowhere near as difficult to read as many of the machine learning papers with complicated notation and proofs. I'm quite surprised the book hasn't received much attention but I think many of the poor reviews were expecting a plug-and-play recipe for a profitable trading system and that's not what inside.

@Kismuz if you haven't read it and think you would I'd gladly try and get a copy to you as a thank you for your work on this project. Can't remember where you're located.

This is the link to the actual user manual for the software. The book covers much of the same material and a bunch more.

TSSB User Manual

ryanle88 commented 6 years ago

@Kismuz Thank you for creating the state signal scaling notebook! It is awesome. Save a lot of time getting the signal strength just right before feeding the data into the gym. image

@tmorgan4 Thank you for the recommendation.

Kismuz commented 6 years ago

@tmorgan4 , really appreciate you offer; I came across this book some tome ago and can confirm it is good cookbook for tasks discussed as well as some others. It was one of the sources that gave me inspiration to work with wavelet transforms as one preprocessing technique; my early btgym preprocessing was based on CWT with ricker wavelet; later I replaced it with temporal convolutions but it gave some valuable insights. Btw I live in Moscow, RU.

Kismuz commented 6 years ago

@ryanle88, I made and pushed some simple updates to notebook and it reveals we got problem here: when plotting signal distribution on per channel basis it comes clear it is essential to apply weighted scaling. That answers my empirical observation (counterintuitive ) that over-sclaing brings faster convergence :/ Should been paying more attention to it. Clearly there is room for improvement.

ryanle88 commented 6 years ago

Thank you for the update @Kismuz ! This makes more sense to me since we need different scaling for different equities with different levels of magnitude. Are you planning to make the change in the near future? Or should I create a strategy method which inherits from the base strategy and make the change according in the meantime?

Kismuz commented 6 years ago

@ryanle88 , I would not touch base method until absolutely necessary as it merely serves as template; observation an reward it computes can be considered dummy ones; subclassing it is suggested method for doing research work. Btgym research sub-package is proper place to put it in. You can browsebtgym.research.strategy_gen_4.py to see evolving classes hierarchy and how, say, reward function or observation space computations has been shaped; making such prototypes (along with proper git branching) brings flexibility; it is easy to swap versions when doing training from notebooks to separate effects every version brings or roll back if something went wrong. When your research is done and proved useful it can be collapsed to single class and moved to one of main stable sub-packges;

ryanle88 commented 6 years ago

@Kismuz Sorry for not being clear. What I meant is exactly what you just suggested since I don't want to directly make any changes to your code which may cause trouble later on when performing a git pull. image

ryanle88 commented 6 years ago

@Kismuz Thank you for taking your time making the changes regarding the signal strengths that I struggled to figure out.

After getting the latest today, I tried to run the code from the State Signal Scaling again (https://github.com/Kismuz/btgym/blob/master/examples/state_signal_scaling.ipynb) but this time I wanted to adjust the state_ext_scale ([3000., 2600., 2200., 1000., 1000., 1000.]) to get the proper signals from all of the SMAs (using the same dataset that you provided: DAT_ASCII_EURUSD_M1_2016.csv). Everything worked out great.

Then I decided to add OHLC columns to see what is going to happen: image

I set the state_ext_scale = np.array([1000., 1000., 1000., 1000., 3000., 2600., 2200., 1000., 1000., 1000.]) and this is the results: image image

I am not sure if this is expected or there is something else that I need to adjust to get the proper signal strengths for all the columns (OHLC and SMAs).

Thank you for your help again!

Kismuz commented 6 years ago

@ryanle88, it is those three first shapes that looks confusing to you?

I'll try to loosely formulate what-is-for in state processing pipe:

First, we take set of signal statistics which are 'similar' but differs in time scale; here signal statistic is plain SMA and time-scaling factor - sma period.We want to get a matrix of somehow encoded information about relatively long signal period (cause to compute one sma256 value you need 256 previous signal values, 32 for sma 32 etc.) Than we take, say, 30 last values of every statistic to form a matrix shaped [num_of_values, num_of stats]. It bears information about last 256 signal values but encoding gets 'coarsier' from present time to -256'th point; this is similar to 'temporal-convolution' idea. So, again those statistics should be of any single type (sma, oscillators, wavelet transforms, windowed fourier transforms etc. ) and should differ in one scale only;

Second, we take one step differences along features axis, we don't want to touch time axis yet (will do it later). Taking one-step difference (np.gradient that is no more than finite differences with smoothing and fancy endpoints treatment) removes trends, centers and stabilises signal; after this operation we can no longer eyeball source signal [can only look at distribution of differences values as above] but neural net will be happy with it; At this point we perform scaling to get more consistent distribution across channels(now can call it hard-coded low-level features). Tanh here is pure elastic scaling and can be thought of as 'compressing dynamic band' of your signal and fighting outliers.

Now we get our state as 'vector in time-features space', and it is long in time dimension (say, 30) and short in features dimension (say, 5). And this vector represents single observation state of POMDP. To feed it to [possibly context-aware] policy estimator we want compress and reshape it to be short in time dimension (1) and long along features dimension (~64)

So, third, we pass it through convolutional encoder of our LSTM agent; unlike with images, we convolve along single time dimension(4 layers) and at the end we get a 64 kernels activations and those activations can be thought as learned features representation of our once-was-time-embedded state (actually we get 2x64 features, but it is ok). Now we can pass it forward to RNN, feedforward net or linear classifier of choice to get policy, values etc.

Looking at your code it is clear that first three stats are not from the same field as rest. Ok, Open value can be thought as SMA-1, but Low is always lower [than Open and Close] and High is, well, higher. So when we take differences along statistics axis, first channel will consist of only positive differences, second and third - only negative ones; and all of these in same time-scale; that's why you get those 'halved' distributions.

If your aim is to bring more 'resolution' to state I suggest stacking Open-Sma2-Sma4-... or can use HL median instead of Open: HL = (High+Low)/2; using full OHLC is redundant imho.

Bottom line: The pipe explained here is obviously not only one 'correct'. It is just one approach which happens to work quite well empirically, so it worth bearing in mind initial intuition behind it when doing modifications; though let-s-try-and-see approach is valuable in a sense that analyzing unexpected observations can give rise to new insights. Anyway efficient state representation is still an open problem and I encourage to try different conceptions and ideas.

ryanle88 commented 6 years ago

@Kismuz Yes, the first three distributions are those that confused me. I am still in the process of exploring to find the right state representation that I can work with based on the information that I have available. Thank you for providing such a detailed explanation. Really appreciate that!!

ryanle88 commented 6 years ago

@Kismuz I have spent sometime in the past two days to explore different state representations to see if I can find anything useful. My rationale was that whatever information is required for a human to have to make an informed decision is also required for a machine to have to make its own decision since the machine is just a lot better in memory and speed than us, not in logical thinking and decision-making process.

For example, for atari games, we humans only need to see the screen progress to make the decision whether to move up or down, left or right, that is why reinforcement learning technique performs so well on those games by feeding four consecutive screenshot pixels to the neural networks.

However, in trading, it is a lot more complicated since trader cannot make a trading decision based on solely the historical prices of OHLC. Depending on the type of trades, the trader needs to see many different things before making the decision on the type of strategy that he/she wants to proceed: S&P500, Dow Jones, Russell 2000, VIX, the date of week, current month, earning announcement dates, price distribution of the underlying, Bollinger Bands, TTE Squeeze signal, and many more.

Therefore, I tried to add extra technical indicators (besides SMAs) like Stochastic, ATR, Momentum Oscillator, CrossOver, Bollingers Band, and some of the EFTs like DIA, FXB, GLD, SPY, TLT, UNG, and so on. However, they did not work as I expected because of the signal strengths. Some of the indicators and EFTs have values way higher than the value that I tried to predict, so I did not get the proper signal strengths for the agent to learn (as you explained above).

image image

At this point, I am not sure how to move forward with this, but will keep learning and exploring in the meantime to see how other fields can overcome this type of challenge.

This is just an FYI. Keep up the good work!!

Kismuz commented 6 years ago

@ryanle88

My rationale was that whatever information is required for a human to have to make an informed decision is also required for a machine

Your idea is correct but implementation goes against principles I wrote about: for convolution network to learn relevant features signal should be somehow consistent across entire gradients field (see above). Personally I cant give any sensible judgment to signal differences from SMA values to Oscillator values; it is not about normalising, it is like comparing apples and horses; don't think it would work out this way.

I am not sure how to move forward with this

One way to start experimenting with new features is to structure it by putting every 'features bank' to separate convolution encoder; than concatenate features learnt and feed entire encoded state tensor to rnn estimator; so it can be sma-encoder, momentum_osc encoder etc. every one of them taking matrix of same-type statistics in different timeframes;

that is why reinforcement learning technique performs so well on those games by feeding four consecutive screenshot pixels

exactly, but not frames and sounds; if one have to he would encode pixels and audio separately

Please also see here: https://github.com/Kismuz/btgym/issues/23#issuecomment-348951755

ryanle88 commented 6 years ago

Got you. Thank you for your help pointing out the directions, @Kismuz !

JaCoderX commented 5 years ago

The only code that I changed was the initial cash amount from 2,000 to 100,000. I got the latest of your code and reran the LSTM notebook for both of them. It converged for the 2,000 but not 100,000. Rather strange to me. Not sure if the signal strength had anything to do with that.

well, because changes in initial cash amount require appropriate changes in stake size. Otherwise max profit /max loss margins (wich are set in precentage to initial cash) became bigger and within episode time span given it is impossible to make significant changes to broker account value --> no noticeable changes in broker statistic --> no proper reward estimation (cause reward terms got normalised wrt max possible profit/loss values) --> agent gets confused :)

initial cash, stake and leverage are pure trading arithmetic; one should put agent in sensible conditions i.e. goal set should be reachable; as sanity check of your broker setup, randomly acting agent should be able to drain your trading account in at least ~2/3 of maximum episode duration; should be able to make at least one stake etc.

Going over this post got me thinking on what will be the effect on markets that have a huge price moments (like the crypto market)? the ratio between a fix size stack and the initial cash will vary greatly between episodes. which I assume will get the agent 'confused'.

I checked the code to see if there is a way to deal with the issue. I guess that if I set portfolio_actions = {} it will force the system to use a continues action (that is percentage based). but this way we lose simplicity of discrete actions.

What will be the proper way to deal with such scenario?

tmorgan4 commented 5 years ago

@JacobHanouna You're on the right track and it's one of many issues that needs to be dealt with in a non-stationary time series. For this specific issue I'd suggest looking into the backtrader PercentSizer. Instead of each trade being a fixed number of shares or contracts (or bitcoin) the size of the trade is now a % of your cash. Make sure to consider the effects this could have on return (i.e. does the volatility vary as a % of price or is it a fixed value)? An easier example to visualize is the way a $1000/share stock can flunctuate $50 in a day but you'd be less likely to see this behavior from a $100 stock. This is why you often see prices being scaled by log(price).

https://www.backtrader.com/docu/sizers-reference.html

JaCoderX commented 5 years ago

@tmorgan4 thank you for replying.

Just to confirm and that the answer will be complete if anyone comes across the same issue I can add cerebro.addsizer(bt.sizers.PercentSizer, percents=25) to the cerebro instance, that will go into the env config and make sure order_size=None is set in the `cerebro.addstrategy(...). this will change BTGym behavior to use a percentage type sizer. right?

JaCoderX commented 5 years ago

I ran some tests with this setup. Agent behavior is not 100% clear.

this is the code in base strategy:

 # Try to define stake, if no self.p.order_size dict has been set:
        if self.p.order_size is None:
            # If no order size has been set for every data_line,
            # try to infer stake size from sizer set by bt.Cerebro.addsizer() method:
            try:
                assert len(list(self.env.sizers.values())) == 1
                env_sizer_params = list(self.env.sizers.values())[0][-1]  # pull dict of outer set sizer params
                assert 'stake' in env_sizer_params.keys()

from the code it can be understood that setting order_size=None will make BTGym look for implicit definition to bt.Cerebro.addsizer() by the user. but the problem is that bt.sizers.PercentSizer doesn't have a 'stack' param. so the assert will raise an error. commenting out the assert doesn't help because later in the code BTGym is expecting 'stack' key so an error will happen anyway.

I tried to make order_size=percents and it seem to work fine (checked that it is indeed 25% on tensorboard) but then we completely override this piece of code.

So it is a bit of a workaround but the behavior of the code is ambiguous

Kismuz commented 5 years ago

@JacobHanouna, this has been made for fixed stakes; indeed, it looks for a dictionary of sizers corresponding every asset traded; if not found - uses single value added by .addsizer for all. It's a bit outdated anyway in sense that as it turns out it much more flexible to estimate and explicitly set order size when submitting individual orders by setting size kwarg (as well as asset).

Kismuz commented 5 years ago

UPDATE ON SIGNAL SCALING: after playing with different preprocessing/scaling/normalising routines it seems that one simple solution based on std normalisation is quite robust to different levels of price or volatility regimes and requires no manual tuning. See implementation example code here: https://github.com/Kismuz/btgym/blob/master/btgym/research/model_based/strategy.py#L189

JaCoderX commented 5 years ago

@Kismuz can you please explain your above statement? I thought that order_size is fix amount for the whole run

It's a bit outdated anyway in sense that as it turns out it much more flexible to estimate and explicitly set order size when submitting individual orders by setting size kwarg (as well as asset).

Kismuz commented 5 years ago

@JacobHanouna ,

order_size is fix amount for the whole run

need not to be; see backtrader documentation; it is convenient as a simplification but one can allow agent to vary order size :) ; more realistic example is 1) when you trade co-integrated pair of assets and estimate co-integration coeff. at the start of the episode than set orders size ratio accordingly; or 2) you can choose to estimate optimal (in some sense) single order size and leverage given current unit price level and account cash.

JaCoderX commented 5 years ago

So if I understand correctly we need to set order_size so the system have some default, but a better option is to control it from the strategy end point

you can choose to estimate optimal (in some sense) single order size

sound like order_size should be learned as part of the network outputs

Kismuz commented 5 years ago

...a better option is to control it from the strategy end point

exactly

order_size should be learned as part of the network outputs

optimal order sizing is one of future research directions; related and promising line is to learn optimal limit-orders placing instead of using market orders (expensive and less efficient in practice).

JaCoderX commented 5 years ago

UPDATE ON SIGNAL SCALING: after playing with different preprocessing/scaling/normalising routines it seems that one simple solution based on std normalisation is quite robust to different levels of price or volatility regimes

@Kismuz I'm modifying MonoSpreadOUStrategy_0 to work with more indicators beside SMA. What I'm trying to understand is how to generalize the solution properly? especially the scaling operation

I have some question regarding the scaling code:

scale = 1 / np.clip(self.data.std[0], 1e-10, None)
x_sma *= scale
dx = np.gradient(x_sma, axis=-1)
  1. isn't scale equivalent to just => 1 / max(self.data.std[0], 1e-10)? (self.data.std[0] is just a scalar)
  2. going over the mathematical definition of Standardization from Wikipedia => x' = (x - mean(x))/ std, aren't we lacking the mean(x) part? something like:
    def set_datalines(self):
     self.data.mean= btind.SimpleMovingAverage(self.data.open, period=self.p.time_dim)
    ...
    def get_external_state(self):
     # each element in feature should be subtract by self.data.mean[0] 
     # and then the result multiplied by the scale factor? 
  3. if I want to use other indicators let's say for example RSI, I assume I need to find the RSI standard deviation separately and then apply the same type of solution used for SMA? but then it means not all features would have the same magnitude (RSI is between 0-100), is it even important? something like:
    self.data.rsi = btind.RSI(self.data.open, period=16)
    self.data.rsi_std = btind.RSI(self.data.rsi, period=self.p.time_dim)
  4. a bit off topic but is there a reason you are using self.data.open instead of self.data.close?
Kismuz commented 5 years ago

@JacobHanouna,

isn't scale equivalent to just => 1 / max(self.data.std[0], 1e-10)?

yes, exactly; it's just my personal preference to use np.clip() for numerical issues handling

aren't we lacking the mean(x) part

yes, we are. Note that we don't actually aim for full Z-score normalisation 'cause we use differencing operator afterwards. Think best way to get sense of it is to try this option and see how it affects the performance;

find the RSI standard deviation separately

think, yes 'cause that's quite another type of a signal. I further suggest using separate convolution encoders for any other type of features; lstm_policy class can easily handle this case if properly specified observation space is received;

reason you are using self.data.open instead of self.data.close

no any specific reason.

PS: observation and statistics variance scaling is quite an important issue and definitely should be thought upon in a sense that price volatility (~observation variance) is not only normalisation statistic but also the first important factor of asset potential profitability itself; than it make sense to scale potential max profit/loss bounds and [possibly] reward function w.r.t. [rolling] variance of tradable asset. As for now, max_profit and max_drawdown limits are set manually which is not optimal. I'm working on it and will let know of any new results.

JaCoderX commented 5 years ago

@Kismuz thanks for the clarifications.

I have modified the code to include the mean value for full Z-score normalization and currently testing it. Will post my conclusions once it finish the run.

update: I ran the test with mean subtraction and results are completely the same. which make sense as the mean is just a constant that shift the z-score to revolve around zero. and when applying a gradient to the constant, it is canceled out.

JaCoderX commented 5 years ago

I further suggest using separate convolution encoders for any other type of features

@Kismuz is it done by adding a new entry in stateshape (and adding a corresponding get_state)?

Kismuz commented 5 years ago

@JacobHanouna , yes, in particular by making external state shape nested dictionary itself. Look at the example here: https://github.com/Kismuz/btgym/blob/master/btgym/research/casual_conv/strategy.py#L541 and https://github.com/Kismuz/btgym/blob/master/btgym/research/casual_conv/strategy.py#L682

think of every asset specified as different feature bank;

Also note policy kwarg share_encoder_params telling either encoders variables should be shared or not (sensible to set True if all data streams are of similar type):

https://github.com/Kismuz/btgym/blob/master/btgym/algorithms/policy/stacked_lstm.py#L33

JaCoderX commented 5 years ago

think of every asset specified as different feature bank

sound like a cool feature to experiment with :)

Also note policy kwarg share_encoder_params telling either encoders variables should be shared or not

is there a difference between multiple encoders with shared params (with partial input space) and a single encoder (with all the input space)?

Kismuz commented 5 years ago

@JacobHanouna ,

is there a difference between multiple encoders with shared params .... and a single encoder

Yes, a bit. Consider stacking several similar signals to feed single encoder. It all get mixed channel-wise. Feeding same into shared encoders makes whole thing to behave more like signal-wise separable convolutions so activations and features learnt will be different though (it theory) equivalent.

JaCoderX commented 5 years ago

in particular by making external state shape nested dictionary itself

@Kismuz just to clarify the process of separating encoders, Is it required to have external state shape as a nested dictionary? or is it sufficient to just return a dictionary from get_external_state() in order to have separate encoders?

Kismuz commented 5 years ago

@JacobHanouna , yes, it is required to declare shape as nested dict AND it is ok to return a dictionary from get_external_state() since I haven't implemented automatic nested methods lookup, see:

https://github.com/Kismuz/btgym/blob/master/btgym/research/casual_conv/strategy.py#L544

and

https://github.com/Kismuz/btgym/blob/master/btgym/research/casual_conv/strategy.py#L682