Using 5-,10-,30-minute data feed rather than 1-minute and the timeframe argument in BTgymRandomDataDomain

ALevitskyy commented 6 years ago

HI, I spent some time looking over the guided_a3c example, documentation and reading the code for the BTgymRandomDataDomain class, and my question is: if I want to use pandas.resample() function and downsample the histdata.com data to lower frequency (for example to 5M frequency) and feed it to btgym, is the only thing I need to change is the timeframe argument when initiating BTgymRandomDataDomain? My thought was that 1M data is very noisy and does not add much information relative to 5M data, so training on 5M (or even 30M, as to me everything with higher frequency does not add much) data should be more fruitful. To compensate, one can use more than one year worth of data. Am I wrong?

I am sorry in advance if the question appears to be stupid (I am quite a newbie) and thank you very much for the wonderful gym environment and showing examples of algorithms which I thought was only for the gods in Google Deepmind and which I would never get my hands on.

Kismuz commented 6 years ago

@ALevitskyy,

if I want to use pandas.resample() function and downsample the histdata.com data to lower frequency (for example to 5M frequency) and feed it to btgym, is the only thing I need to change is the timeframe argument when initiating BTgymRandomDataDomain?

From the data pipeline side, yes. Since whole thing is based on backtrader, data resampling approach is the same as when working and resampling timeframes/bars for conventional backtrader data/strategies. Look for backtrader.com docs and forum for some hints on that topic.
From RL environment side, there is 'skip_frame' strategy parameter you should change accordingly: skip_frame sets number of backtrader strategy steps which is considered one step of RL environment, i.e. if you feed 1 min bars and set skip_frame=10 --> every call to env.step() advances time by 10 minutes it restricting number of agent iterations with environment. I have found it essential for convergence with short timeframe data (no agent was able to converge with 1min data and skip_frame < 5). It can be explained by high variance of data statistics (due to noise as you mentioned). So if you feed 5 min bars leaving skip_frame=10 only allows your agent interact with environment every 50 minutes which is quote rough.

My thought was that 1M data is very noisy and does not add much information relative to 5M data, so training on 5M (or even 30M, as to me everything with higher frequency does not add much) data should be more fruitful. To compensate, one can use more than one year worth of data.

yes, 1 min is noisy but my reasoning was it is better have good initial resolution one can always downsample later when constructing features via SMA bank, CWT etc. Indeed, by my experience any unfiltered signal under 4 min timeframe only adds noise. So trading initial resolution for bigger train set and faster execution time can be reasonable.

ALevitskyy commented 6 years ago

Thanks, got all my questions answered. I want to try to use TA-lib to feed in different technical indicators rather than raw time-series data and to see whether it helps or makes matters worse. And using TA-lib for pattern recognition on 1 minute bars is a bit pointless. I am still deciding whether to alter the original feed to 15M bars or use resample() on strategy/policy level, and downsample/generate features after receiving 1M bars from server, as you suggested.

Also, a bit off-topic but related to guided_a3c example I referred to before and time dimension. The example imports GuidedStrategy_0_0 from btgym/research/gps/strategy.py. In that file time_dim = 30 is set on line 16. Does it mean that the example only looks at data 30 minutes back? I guess that`s a reasonable parameter for sin function demo, but I would change it for training on ForEx data. Assuming that the data-generating process has long memory, I was wondering what is a reasonable parameter or if you would be willing to share what value of the parameter you used in your experiments? Would using something like time_dim = 1800 significantly slow training down?

Sorry again if any of my questions appear to be stupid. I will let you know if any of my experiments would improve on generalizability to test data and will share my code if I get something to work.

Again thanks for the gym environment, I was thinking for writing something similar for learning purposes, but now can focus on learning more interesting things!

Kismuz commented 6 years ago

@ALevitskyy,

The example imports GuidedStrategy_0_0 from btgym/research/gps/strategy.py. In that file time_dim = 30 is set on line 16. Does it mean that the example only looks at data 30 minutes back? I guess that`s a reasonable parameter for sin function demo, but I would change it for training on ForEx data. Assuming that the data-generating process has long memory, I was wondering what is a reasonable parameter or if you would be willing to share what value of the parameter you used in your experiments? Would using something like time_dim = 1800 significantly slow training down?

yes, it exactly means that single environment state incorporates past 30 min of data;
it is sufficient if you employ context-aware estimator like RNN network (LSTM in this case), because it it encodes previous timesteps in it's inner state (context vector); still it works slightly better if you set time_dim=128 for real data - found it optimal in terms of speed/performance.

See also #23 and this except particularly::

I would point that algorithms itself as absolutely data agnostic, it is parametrised policy estimator architecture that has been tuned for particular input type. And it CAN handle temporal relationships right from first DQN atari game time, even when simple convolutional feedforward architecture was used. In brief, intuition comes from dynamical systems theory, from Takens embedding theorem in particular. It states, roughly, that for any dynamical system S unfolding in discrete time, there exists a finite number N, such as at any time moment entire system dynamics is described by vector of N last states V[t] = [S[0], S[-1], ..., S[-N]], called time-embedding. Note that by above theorem dynamical system S' consisted of states V is always 'markovian', even if original system is not. That's why all feedforward RL estimators in atari domain use 'frame-staking' feature, usually 4 frames. This is, any atari game needs just time-embedding of 4 to became markov decision process, thus enabling correct application of Bellman equation which is in the heart of above mentioned RL algorithms. When you employ rnn estimators, it is exactly rnn hidden state from previous step that holds all the time embedding information in 'compressed' form. But It seems that in practice we need both time-embedding AND rnn context to learn good (=disentangled) spatio-temporal representations as recently noted: https://arxiv.org/pdf/1611.03673.pdf https://arxiv.org/pdf/1611.05763.pdf

mysl commented 6 years ago

Indeed, by my experience any unfiltered signal under 4 min timeframe only adds noise

@Kismuz , hi, I have a question w.r.t this, do you think if using first level tick data would help? Intuitively, if the intent is to catch some market microstructure level alpha, bid/ask/price would have more information which is missing by downsampling into 1 min bar. And another direction, may be use alternative sampling method from tick such as volume, instead of time?

Kismuz commented 6 years ago

@mysl, I think using first level data can help as long as we talk about usual arbitrage (maybe at short time segments) but not HFT strategies. Reason is we need 'no market impact' assumption to hold to perform valid backtesting, which is not the case for HFT strategies.

may be use alternative sampling method from tick such as volume, instead of time?

how to sync several assets if one wants to set multi-instrument trading?

mysl commented 6 years ago

@Kismuz , thanks for reply

how to sync several assets if one wants to set multi-instrument trading?

The reason I am suggesting this is because I find that currently the policy is overfitting with the training data, and perform poorly in testing period, which is probably caused by distribution shift in non-stationary env. I know that you are attacking the problem with meta learning approach, do you have any progress or update on this?
What I am thinking is from the other side of the problem, can we re-frame the input from the environment so that it's more stationary and thus more friendly to current method? I am starting to read the book 'Advances in financial machine learning' from Marcos Lopez, in some chapter, he mentioned that alternative bars have better statistical properties than time bar. There are other methods worth trying in his book. Anyway, IMHO, what we need first is to make the single instrument work in live trading, right? After that, if someone wants to extend to multi-instrument, it may add a time line in backtrader for sync. It may not be trivial, but not impossible

Kismuz commented 6 years ago

@mysl,

attacking the problem with meta learning approach, do you have any progress or update on this?

Actually not much of a progress has been made. Slight improvement is evident but it is far from satisfactory. Meta-learning is can improve results but it is not a magic pill. It appears 'zero' generalisation for our setup is due to the fact architectures used where unable to find/extract temporal patterns which are invariant (or, better say equivariant) to domain shift and noise. There are several directions i/m working on now (temp. suspending meta-learning direction):

better architecture to state encoder/algorithm:
- using casual convolutions with attention; implemented and gives better results than using conventional convolutions; code can be found at /btgym/research/casual_conv/; will publish demo notebook soon;
  - engineering better signal features which promote appearance of invariant patterns; one of promising directions (apart from hand engineering) is approach from 'Learning Sparse Wavelet Representations' paper, https://arxiv.org/abs/1802.02961; SSA decomposition is another direction for feature selection;
  - employing distributional Q-learning, https://arxiv.org/abs/1707.06887 or it's variants (like idea of splitting one environment into several simpler ones and learning separate value function for each)

using multiply correlated assets to archive better predictive performance on single asset trading or multiply assets trading (and later setting just becomes cont. portfolio optimisation problem), i.e. feeding EURUSD EURCHF EURJPY..., etc. data simultaneously to catch high level correlations. This is work in progress and I hope to push it in a week or so.

mysl commented 6 years ago

@Kismuz , thank you very much for sharing the information. All sound very interesting! Looking forward to your new publish :-)

developeralgo8888 commented 6 years ago

@Kismuz , to add to the discussion

" using multiply correlated assets to archive better predictive performance on single asset trading ..."

I think for the best features we can also add Spot DXY (Dollar Index - 24 hours ) calculated from the standard formula when it comes to Forex or currencies. DXY index includes prices 5 major pairs & 1 exotic -- EURUSD, GBPUSD,USDJPY,USDCHF,USDCAD & USDSEK. All currencies with USD as Base currency will always be affected by DXY. its a simple measure of USD Strength in relative to other major currencies but might encode new information not available in asingle currency. We can also add EURX ( Euro Index ) , JPYX ( Yen Index ) , etc

mysl commented 6 years ago

@Kismuz , deepmind published a new paper , https://arxiv.org/abs/1807.03748, sounds interesting. Do you think it will help for the trading task?

Kismuz commented 6 years ago

@mysl , thanks, seems very interesting, need to take closer look.

mysl commented 6 years ago

enchancements to agent architecture:

casual convolution state encoder with attention for LSTM agent;
dropout regularization added

@Kismuz , do you have some test performance data w.r.t how much this helps with the generalization in test time? thanks!

Kismuz commented 6 years ago

@mys,

casual convolutions w/ attention rises convergence speed by ~10 -15% in number of iterations; think it can be improved by using multi-head attention;
dropout rates slightly improve generalisation if set to extremely high values e.g.. 0.25 < keep_prob < 0.5

ALevitskyy commented 6 years ago

Updates with regards to my issue:

-I did not manage to use 30M bars in a nice way, as just changing backtrader, sampling parameters and pandas resample causes errors when running the examples for some reasons. As a result to make it work, I just changed index on my resampled 30M bars dataset so it looks like 1M bars (as you said algorithm is data agnostic) for the Launcher. The only parameter I changed relative to the example is setting skip_frame to 1, so the algorithm makes decisions every 30M (every 1M as seen by the algorithm);

-I have not done tests yet (forgot to split sample into test/train:( ) but performance on the train set is OK.

With regards to algorithms, I was wondering, if early stopping may improve generalizability of the models? Like occasionally checking performance on the validation set and when the performance declines for N number of iterations terminate training. Could it be that after some number of iterations the algorithm stops recognizing patterns general to both test and train sets and starts learning train set only related patterns? I guess it would be hard to implement for now, as I could not find an easy way to run algorithm on test set rather than relaunching the launcher with new data.

Kismuz commented 6 years ago

@ALevitskyy , yes early stopping could help.

I guess it would be hard to implement for now, as I could not find an easy way to run algorithm on test set rather than relaunching the launcher with new data.

Actually it's quite easy. BTgym has built-in mechanism for algorithm to request train/test examples explicitly via extension arguments of env.reset() method and managing train/eval splits via data iterators via frozen_time_split param. Unfortunately I has not yet wrote documentation/examples on this topic but you can figure it out by examining code of btgym/research/encoder_test/EncoderClassifier class. Take a closer look at get_sample_config(), process_train() and process_eval() methods. To understand runner execution logic take a look at btgym.algorithm.synchro.BaseSynchroRunner class.

mysl commented 6 years ago

@Kismuz another interesting paper, https://arxiv.org/abs/1806.10474, the autoregressive discrete autoencoder method may be useful

ALevitskyy commented 6 years ago

Sorry for yet another basic question, but what is the difference between HOLD and CLOSE actions. Lets say I am initially long, and then the algorithm decides to Hold- does it mean it just closes the position?

Kismuz commented 6 years ago

HOLD means "do nothing"; CLOSE - close any open position

ALevitskyy commented 6 years ago

So if I have buy and next thing in a sequence is hold, I stay long?

Kismuz commented 6 years ago

Exactly. You can think of it as of skip-step

ALevitskyy commented 6 years ago

So, if algorithm buys in one period and then buys again the next period, does that mean it bought the pair twice, i.e. doubled up the position?

I ran a test, and the Tensorboard shows positive empirical return but the image shows very bad performance (I used the btgym version before the multiasset update). After altering the policy function so it keeps log of every action in a separate csv file and examining it, the performance is actually bad despite what Tensorboard claims. Did you encounter this kind of bugs in Tensorboard reports before?

ALevitskyy commented 6 years ago

Found my answer at issue #23 with regards to multiple stackes being possible

Kismuz commented 6 years ago

If we talk about 'empirical return' metric from tb monitor, it relates to algorithm mdp return and is not directly connected with 'portfolio return'. To judge performance one should go for 'total reward' and 'final value' metrics.

ALevitskyy commented 6 years ago

That was stupid by me... I get now what you mean by lack of generalization. Will see if I`ll be able with my current level of skill to program monitoring of performance on a hold-out set during training and possibly an implementation of early stopping.

ALevitskyy commented 6 years ago

Actually changing guided_a3c example to show test-episode statistics on the Tensorboard is very easy, given that you have done most of the work already. All I did was:

1) While initiating BTgymRandomDataDomain specified frozen_time_split={"year":2017,"month":1, "day":5, "hour":8, "minute":52,"second":0}, or any other timestamp specified as a dictionary (I can change the code so it would accept strings as well using pandas.to_datetime function)

2) Specified episode_train_test_cycle=[1,1] when initiating GuidedAAC

3) Made my own runner TestTrainRunner instead of using BaseSynchroRunner which inherits from it and overwrites get_data() function and added it using runner_config argument to GuidedAAC. The only thing I changed in get_data() was the way the function was assigning is_test variable (I think it should be changed in the main branch as well, then there would be no need to overwrite BaseSynchroRunner) using the code you commented out for some reason. I am just wondering why did you decide to use is_test = is_subdict(self.test_conditions, self.pre_experience) instead of commented out code? I deleted that line and uncommented the code while also deleting the extra condition and self.pre_experience['state']['metadata']['trial_type']? What is the trial_type argument responsible for and what extra information does it give relative to self.pre_experience['state']['metadata']['type']? I haven't checked if test summaries appear on TB if I just uncomment your code without removing the extra condition, but they appear after all the steps I described.

Do you think the test summaries generated are trustworthy? Did I do anything wrong? Going to check the behavior on real data soon. If everything is fine, I can make a pull request to changeBaseSynchroRunner, so it supports validation set and to BTgymBaseDataDomain to parse strings as well as dictionaries for frozen_time_split argument.

Thanks for good hints and for maintaining such a great package!

ALevitskyy commented 6 years ago

So have 2 problems with running the changes, described in previous posts:

1) Minor (resolved): in datafeed/base.py frozen_index = self.data.index.get_loc(self.frozen_time_split, method='ffill') pandas throughs a Key Error, which seems to be a bug in pandas. The way I resolved it was to pass self.frozen_time_split.timestamp() directly to self.set_global_timestamp()

2) The training and testing runs for some time with TB summaries working just fine before the following occurs and all the workers stop executing (check update below, resolved):

BTgymAPIshell_2: Unexpected environment response: No key received:Control mode: received <{'action': {'default_asset': 'hold'}}> Hint: forgot to call reset()? Hint: Forgot to call reset() or reset_data()? Traceback (most recent call last): File "~/btgym/btgym/envs/base.py", line 605, in _assert_response assert type(response) == tuple and len(response) == 4 AssertionError [2018-08-16 19:28:50.367393] ERROR: GuidedA3C_2: process() exception occurred

Not found a way to mute the problem or resolve it yet. Did you encounter this before?

UPDATE: Using BtgymDataset rather than BtgymRandomDataDomain solves the problem

Kismuz commented 6 years ago

@ALevitskyy, good work; I'll take some time to remember why I have commented out some code in SyncroRunner :/ Think it was attempt to implement more complex data iteration process to make global time ticking to make past data immediately accessible for training within just single step (minimising the time gap between train and test sets and imitating real-tome trading). This requires global_time incrementation (as opposite to freezing it as for now).

Unexpected environment response: No key received:

this error usually means no reset() was called from algorithm side prior to attempting making steps;

methenol commented 6 years ago

Hi @Kismuz, wonderful library you've been working on! I've been trying to get a different data source working (currently poloniex 5 min data) and have been running into some issues. Hoping you can point me in the right direction, not sure if it's the format of the data or if it's due to the 5 min intervals. I can get the data to load into backtrader, but not with btgym. I have it in a pandas dataframe before exporting it to a csv, using the formatting below:

df['date'] = pd.to_datetime(df['date'],unit='s').astype(datetime)
df['date'] = df['date'].dt.strftime('%Y%m%d %H%M%S')
df = df.rename(columns = {'date':'datetime'})
# df = df.sort_values(by='date',ascending=False)
df = df.set_index('datetime')
df.to_csv('./data.csv',
        sep=';',
        columns= ['open', 'high', 'low', 'close', 'volume'],
        header=False
)

The resulting CSV looks like this: 20170917 230500;3699.00009978;3699.00009978;3682.4;3688.02575368;101033.00853466 I've been trying to get the output as close to the example datasets as I can but may have missed something.

When I uncomment the sort line, btgym appears like it's working but the returns are always zero and the initial global_time is set to the most recent record in the dataset instead of the initial record. Here's some of the output:

[2018-09-01 00:45:31.335458] NOTICE: BTgymDataServer_0: Initial global_time set to: 2018-08-31 20:30:00 / stamp: 1535761800.0 INFO:root:Finished episode 1 after 2848 timesteps. Steps Per Second 363.44 INFO:root:Episode reward: 0.0 INFO:root:Average of last 500 rewards: 0.00 INFO:root:Average of last 100 rewards: 0.00

It goes on with 0.0 rewards forever. With all of the example datasets in your library there's a good amount of negative returns for the first 100 or so episodes then it's mostly positive and ends up converging around 10-20 mil steps using a tensorforce PPO agent

When I uncomment the line, the initial global_time is the start of the dataset, but I get this error:

[2018-08-31 04:21:14.204132] NOTICE: BTgymDataServer_0: Initial global_time set to: 2017-09-17 19:10:00 / stamp: 1505689800.0 [2018-08-31 04:21:14.264186] ERROR: SimpleDataSet_0: Quitting after 101 sampling attempts. Full sample duration: 9 days, 23:50:00 Total sample time gap: 7 days, 23:51:00 Sample start time: 2017-10-12 00:00:00 Sample finish time: 2017-10-21 23:50:00 Hint: check sampling params / dataset consistency.

I know the hint is obvious, I'm just not sure where to go from here. Any ideas on what to try? I can provide more data or turn debug on and provide output if it would help (just makes for a longer post) I've tried setting timeframe=5 both when setting the env and directly in base.py but get the same results

Kismuz commented 6 years ago

@methenol , pls review #8 #13 #25 #40

in a glance, for your data:

strategy scaling parameter should be adjusted to fit absolute price values of your instrument
ensure broker amount and leverage set to make trading possible
data inconsistency means iterator was unable to find continuous data segment within parameters specified (~episode length and maximum skip_time parameter); later is simply sample size tolerance i.e. id you set episode length 1 day and skip_time param to 6 hours iterator accept any continuous segment of records of length from 18 (24 -6) to 24 h. Iterator makes 100 samples with random start time and if not a single one fits it throws an exception.

ALevitskyy commented 6 years ago

Hi @methenol, I had very similar problem and after spending some long time did not manage to make it work "properly". I wanted to use 5M data as well, so what I did was to change index on 5M pandas dataframe so it looks to Btgym as if it was 1M data. The code I used was: import pandas as pd time_index = pd.to_datetime(list(range(0,len(data.index))),unit='m') data.index=time_index data.to_csv("data.csv",header=False,sep=";") which creates a data frame which would look to Btgym as if it was 1M data starting 1970. Then it works fine on default settings from guided_a3c example, I am not sure about other examples. Works fine if you carefully change some other settings. I know it is not a good solution, but I am happy with it as a temporary workaround.

ALevitskyy commented 6 years ago

With regards to the test set tensor board summaries, while they show up on Tensorboard they are wrong. It seems like the algorithm trains on test examples as well, so displayed test performance is as good as train performance. A simple work-around would be to set episode_train_test_cycle=[1000,1] then algorithm would not have enough examples to fit test data. Will let you know if I find a problem in the modules imported in guided_a3c.ipynb

methenol commented 6 years ago

Thank you both for the help Ended up resampling to 1min intervals. This keeps the date intact and pads the missing entries when it resamples. I'll look into using the skip frames feature so there's less fluff.

        df['date'] = pd.to_datetime(df['date'], unit='s')  
        df['date'] = df['date'].astype(datetime)  
        df = df.set_index('date')
        df = df.resample('1T').pad()
        df = df.sort_index(ascending=True)
        df.index = df.index.strftime('%Y%m%d %H%M%S')

        df.to_csv('./data.csv',
                sep=';',
                columns= ['open', 'high', 'low', 'close', 'volume'],
                header=False
        )

The environment is now issuing rewards as expected. Guessing the hangup occurred when the strategy interacts with the broker, wrong date = prices that would never get satisfied by the order so the returns were 0.0. Additionally the fixed_stake and start cash were defined incorrectly for this asset in my config, thank you @Kismuz

I tested this with 5m and 30m data. It resampled it to 1m and the environment worked as expected.

Keep in mind this is for data with unix timestamps as the date.

ALevitskyy commented 6 years ago

Even setting up episode_train_test_cycle=[1000,1] does not help to solve the problem I referred in my previous comment so I presume that the problem is actually that test data is sampled from the train set or frozen_time_split does not actually do anything. I will check if this is the case today or tomorrow. Do you have any ideas on how the problem can be solved? What files other than those in datafeed folder, I need to look at?

Kismuz commented 6 years ago

@ALevitskyy , There is two possible causes of data leakage:

Data iterator implementation bug causing test data to be sampled into train episodes;
Algorithm bug, e.g. when gradients get computed on test data and somehow sent to parameter server;

One need to eliminate iterators bug possibility in first place: There are several tests that has been performed on data iterator classes to ensure correctness. You can find it at: https://github.com/Kismuz/btgym/blob/master/btgym/datafeed/test_data.py https://github.com/Kismuz/btgym/blob/master/btgym/datafeed/test_casual_data.py

those are pretty straightforward if you familiar with unittest suite: just making a lot of samples both from test and train sets and explicitly verifying data time consistency. All data classes commited to package have passed these. But you never know for sure :[ Check it out and run with your data and settings to see if test passes.

If it is ok with data iteration - need to see what's wrong with algorithm. BTW, you can just set episode_train_test_cycle=[1,0] to prevent test data sampling at all.

Kismuz commented 6 years ago

@ALevitskyy, there was indeed sampling bug in aac.py that forced test episode being sampled from train set. Quite a stupid thing, I apologise for that. It's fixed by now, you should update btgym. Let me know if it has solved your issue.

ALevitskyy commented 6 years ago

Vow, great news! Checked the commit, didn't spot this one. In the end I gave up looking for an error because I don't know the software that well and have started a new job, so have less time, but well done. Will have a look at whether it is worth trying to implement early stopping.

ALevitskyy commented 6 years ago

Running on my dataset, I seem to get "ERROR: Trial_0: Train subset should contain at least one sample, got: train_set size: 1440 rows, sample_size: 2870 rows", does the code runs well on yours?

Kismuz commented 6 years ago

yes. Can yo share your setup details: stat on data (num records, time span), dataset class and settings ( episode duration, trial duration train/test split etc.?) Here is the example of my correctly working code (data file contains three days of 1 min data):

# Data setup:
filename='./data/CRYPTO_M1_201809_biased_1e-5.csv'

parsing_params = dict(
    # CSV source specific parsing params:
    sep=',',
    header=0,
    index_col=0,
    parse_dates=True,
    names=['open'],
    timeframe=1,  # 1 minute.
    datetime=0,
    open=1,  # only single value used 
    high=-1,
    low=-1,
    close=-1,
    volume=-1,
    openinterest=-1,
)

domain = BTgymDataset(
    filename=filename,
    episode_duration={'days': 0, 'hours': 22, 'minutes': 0},
    time_gap={'days': 0, 'hours': 12},  # episode duration tolerance
    start_00=False,
    start_weekdays={0, 1, 2, 3, 4, 5, 6},
    parsing_params=parsing_params,
    target_period={'days': 1, 'hours': 0, 'minutes': 0},  # reserve 1 final day as test set
)

Kismuz commented 6 years ago

this error says your entire train (it actually can be test )set contains less records than required for single episode, given duration you set.

ALevitskyy commented 6 years ago

Thanks, it now works on my dataset with your settings. I think I set up test_sample_duration larger than default target period. I will leave it to run for a day and will check tensorboard summaries to see whether the sampling problem was sorted. Btw, what is the difference between test_period (part of sampling params) and target_period? Can I just leave test_period zero?

Kismuz commented 6 years ago

for BTgymDataset class those are the same.

ALevitskyy commented 6 years ago

I confirm, that sampling is fixed and that at the current point there is no point in implementing early stopping according to the results I got. I think you can close this GitHub issue, if I have questions I can open a new one. In the meanwhile, I am looking at papers on simple linear online learning algorithms, which may be useful either as a learner itself or for feature engineering: 1) https://arxiv.org/abs/1507.07147 2) https://arxiv.org/abs/1611.02365 3) https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/download/12135/11816 which can be combined with the following papers to change learning rates adaptively: 4) https://arxiv.org/pdf/1206.1106 5) https://www.aaai.org/ocs/index.php/AAAI/AAAI12/paper/view/5092/5494

My idea is to make several adaptive ARIMA or EC-VARMA learners for different time-series frequencies (5M,30M,1H,4H) and combine them linearly with TD reinforcement learning algorithm. Will see if it flies

Kismuz commented 6 years ago

@ALevitskyy, Idea sounds promising, pls share your progress.

Kismuz / btgym

Using 5-,10-,30-minute data feed rather than 1-minute and the timeframe argument in BTgymRandomDataDomain #54