Kismuz / btgym

Scalable, event-driven, deep-learning-friendly backtesting library
https://kismuz.github.io/btgym/
GNU Lesser General Public License v3.0
979 stars 259 forks source link

Train Test routine sampling - IndexError #129

Closed mobias17 closed 4 years ago

mobias17 commented 4 years ago

Hi @Kismuz

Actually this defect is part of #95 raised by @JacobHanouna. He points out the issue with index -1 traceback.

Traceback (most recent call last): File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap self.run() File "/home/jack/btgym/btgym/dataserver.py", line 176, in run sample = self.get_data(sample_config=service_input['kwargs']) File "/home/jack/btgym/btgym/dataserver.py", line 88, in get_data sample = self.dataset.sample(sample_config) File "/home/jack/btgym/btgym/datafeed/multi.py", line 219, in sample master_sample = self.master_data.sample(kwargs) File "/home/jack/btgym/btgym/datafeed/base.py", line 539, in sample return self._sample(kwargs) File "/home/jack/btgym/btgym/datafeed/base.py", line 617, in _sample kwargs File "/home/jack/btgym/btgym/datafeed/base.py", line 900, in _sample_interval sample_len = (sampled_data.index[-1] - sampled_data.index[0]).to_pytimedelta() File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexes/datetimelike.py", line 403, in getitem val = getitem(key) IndexError: index -1 is out of bounds for axis 0 with size 0

As I think this is a different issue than #95 I opened a new defect. Can be merged if it turns out to be the same source.

Having taken a look at this issue I suspect it is cause by the test_period. Namely, when I change my episode_train_test_cycle = [1,0] (train sampling only) I don't run into the error. Changing it to episode_train_test_cycle = [0,1] (test sampling only) the error is there. Tested with a test_period as dict or -1.

My analysis so far: The test_period values are correctly passed to the top class BTgymRandomDataDomain. However debugging the BTgymBaseData class, in the _reset mehtod the the self.test_period is always None, hence the sample interval is [0,0], nothing is sampled and hence the traceback.

Haven't found out so far why the params are not passed or set properly. Could use some help on that. Can anyone please confirm that in BTgymBaseData the value self.test_period is not correct/passed (maybe i also just mishandled the the train_test_routine ;-) ). To do so just debug the self.test_period in the _reset method (e.g. before line 364)?

Kismuz commented 4 years ago

@mobias17, you understand it correctly [0, 0 ] is definitely not correct value.

To do so just debug the self.test_period in the _reset method (e.g. before line 364)?

yup

mobias17 commented 4 years ago

@Kismuz looked a bit deeper into the issue. In the End, it seems it is not an error.

Namely, the parameter self.test_period in BTgymBaseData is not set by the test_period of the trial_params dict passed to BTgymRandomDataDomain. It is set by the target_period argument of BTgymRandomDataDomain class, and so is responsible for the data sampling. It is change adjusted in the Hacky Trick of BTgymRandomDataDomain.

So when tests are run the argument target_period cannot be None as set to default, like in the A3C example. I am still a bit confused on the difference between test_period, traget_period, trial_period, train_period etc. and how to set the values correctly, despite the chart in the chart in the documentation... But in the end the issue seems invalid so I close it.