Kismuz / btgym

Scalable, event-driven, deep-learning-friendly backtesting library
https://kismuz.github.io/btgym/
GNU Lesser General Public License v3.0
979 stars 259 forks source link

BTgymMultiData - Sync between different data stream #131

Closed JaCoderX closed 4 years ago

JaCoderX commented 4 years ago

@Kismuz, Until now I was working with a couple of data streams from the same timeframe of 1 min. but recently I had an idea to incorporate a different data stream (data as information) with timeframe of 1 day.

when I ran the model with the new data stream I got an AssertionError: assert self.train_num_records + self.max_gap_num_records >= self.sample_num_records

I did some digging in the code and found that the problem is when performing the intersection operation to try and sync all the data streams. https://github.com/Kismuz/btgym/blob/78be85c74a471da37d41ea33d72505411100f01b/btgym/datafeed/multi.py#L158-L173

So when the operation is looking for a time intersection between 1 min time bar and a 1 day time bar, it basically turning the data to a 1 day timefame and the self.train_num_records goes down by a factor of the new timeframe, hence the assertion error.

I tried to comment out all the intersection part of the code but it give an error when trying to sample the data (might be a simple error but didn't try to change more code)

I can understand the logic of syncing different data stream of the same timeframe to keep everything in order. but not for different timeframes.

Any thoughts, am I missing something here?

Kismuz commented 4 years ago

@JacobHanouna ,

I can understand the logic of syncing different data stream of the same timeframe to keep everything in order. but not for different timeframes.

JaCoderX commented 4 years ago

is there a true necessity for doing the intersection sync? I mean if I take it off or do some kind of sync by timeframe groups?

JaCoderX commented 4 years ago

@Kismuz, Just for adding more information to the topic.

If I use the Backtrader native resample filter from within the Strategy class It sync and preform well with BTgym. So I know BTgym can handle working on different timeframes.

As a workaround I can upsample the data to 1 min outside of BTgym and then resample it back in the strategy.

This shows me that there is no fundamental reason why different timeframes shouldn't work together. plus the fact that the intersection sync happens only in BTgymMultiData made me question if it is really needed?

Kismuz commented 4 years ago

Intersection is just simplest method to ensure nn model gets consistent inputs on every timestep. I you can manage feeding different streams consistently by using backtrader resampling feature - it is just fine.

JaCoderX commented 4 years ago

As a workaround I can upsample the data to 1 min outside of BTgym and then resample it back in the strategy.

In the last couple of days I tested this workaround idea and it seem to work quite well.

IMO there is a valid case of working with data from different timeframes, simply because not all 'data as information' can be collected in such low timeframes. So for real world application it is necessary to have an option to use all available data even if from different time resolution.

For now the Backtrader resample option is a good workaround but it comes with a noticeable performance hit due to the fact that this expensive operation need to be done on every new sample.