Kismuz / btgym

Scalable, event-driven, deep-learning-friendly backtesting library
https://kismuz.github.io/btgym/
GNU Lesser General Public License v3.0
979 stars 259 forks source link

Question: Pass data to BTgymEnv directly as pandas.DataFrame #104

Closed deandreee closed 5 years ago

deandreee commented 5 years ago

In 25.01.2019: updates it's written

data_feed classes now accept pd.dataframes as historic data dource via dataframe kwarg (was: .csv files only);

I assume this means I no longer have to pass csv, but I can pass pd.DataFrame object. Unfortunately, I was not able to find any examples of this.

I tried 2 approaches: First:

filename = "./examples/data/TEST_3k.csv"
pd_dataframe = pd.read_csv(filename)
env = BTgymEnv(
    datafeed_args={"dataframe": pd_dataframe},
...

Second:

pd_dataframe = pd.read_csv(filename)
env = BTgymEnv(
    dataframe=pd_dataframe
...

Both of them throw this error:

Traceback (most recent call last):
  File "/home/and/Desktop/code/btgym/btgym/dataserver.py", line 134, in run
    assert not self.dataset.data.empty
AttributeError: 'NoneType' object has no attribute 'empty'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/and/Desktop/code/btgym/btgym/dataserver.py", line 137, in run
    self.dataset.read_csv()
  File "/home/and/Desktop/code/btgym/btgym/datafeed/base.py", line 436, in read_csv
    for filename in self.filename:
TypeError: 'NoneType' object is not iterable

Would really appreciate simple example of how to accomplish this. I'm new to python, so sorry if the solution is simple or obvious. Thanks!

Kismuz commented 5 years ago

@deandreee , sorry for late reply:

  1. when one loads csv file into the dataframe it is essential to provide dayaset as instance and ensure consistent parsing parameters to pd.load_scv() method (when data is loaded inside DataSet it does so automatically):
    
    filename1 = './data/BINANCE_BTC_USD_20181003_20181112.csv'

csv_parsing_params = dict( sep=',', header=0, index_col=0, parse_dates=True, names=['open', 'high', 'low', 'close', 'volume'],

) asset1 = pd.read_csv(filename1, **csv_parsing_params)

Data setup:

btfeed_params = dict( datetime=0, nullvalue=0.0, timeframe=1, high=1, low=2, open=3, close=4, volume=5, openinterest=-1, ) parsing_params ={} parsing_params.update(csv_parsing_params) parsing_params.update(btfeed_params)

dataset = BTgymDataset( dataframe=asset1, train_episode_duration=dict(days=3, hours=23, minutes=59), test_episode_duration=dict(days=1, hours=23, minutes=59), time_gap=dict(days=1, hours=0, minutes=0), parsing_params=parsing_params, )

env = BTgymEnv(dataset=dataset)


see also:

https://github.com/Kismuz/btgym/blob/master/examples/data_domain_api_intro.ipynb

https://github.com/Kismuz/btgym/blob/master/examples/setting_up_environment_full.ipynb
deandreee commented 5 years ago

Thanks, but still getting the same error. From what I understand, it throws here:

        try:
            assert not self.dataset.data.empty

From my debugging, looks like self.dataset.data is not defined, but self.dataset is defined. Maybe problem is in my csv? Should be original format, but anyway, here is example (I switched sep to ";" in csv_parsing_params btw):

20160103 170000;1.087010;1.087130;1.087010;1.087130;0
20160103 170100;1.087120;1.087120;1.087120;1.087120;0
20160103 170200;1.087080;1.087220;1.087080;1.087220;0
Kismuz commented 5 years ago

@deandreee can you please provide upload link to your [possibly partial] data file? Replicating error locally is easier way to solve.

deandreee commented 5 years ago

@Kismuz It's basically the same data from /examples/data/ for example DAT_ASCII_EURUSD_M1_2016. I tried multiple others and the result (error) was the same.

deandreee commented 5 years ago

Looks like I finally got it working. After doing some manual debugging with print() (VS Code doesn't support multiprocess debugging yet, unfortunately), I found that BTgymDataset class doesn't actually use dataframe prop. But, there is BTgymDataset2 which does! I switched to other class, and now looks like it's working:

dataset = BTgymDataset2(
        dataframe=asset1,
        parsing_params=parsing_params,
)
Kismuz commented 5 years ago

@deandreee, oh yes indeed, my fault. Forgot to add deprecation warning to DataSet class. DataSet2 was created exactly as upgrade to DataSet with dataframe input support. Sorry it took so much time.

deandreee commented 5 years ago

No problem, glad it's solved. I'm closing the issue.