Kismuz / btgym

Scalable, event-driven, deep-learning-friendly backtesting library
https://kismuz.github.io/btgym/
GNU Lesser General Public License v3.0
985 stars 260 forks source link

Custom Dataset Issue, similar to #8 #113

Closed munkarkin96 closed 4 years ago

munkarkin96 commented 5 years ago

Hi Andrew, I apologise that this problem has resurfaced, I tried solving it but I keep running into issues. it's to feed a custom dataset to Cerebro. I have essentially followed the steps as outlined in #8 and took reference from #31 and #25

Error Message:

` [2019-07-05 13:45:59.702520] ERROR: SimpleDataSet2_0: Data file <data/RL_data.csv> not specified / not found / parser error.

ParserError Traceback (most recent call last) ~/Library/Mobile Documents/com~apple~CloudDocs/Y4S1/EE4002R/reinforcement_learning/btgym/btgym/datafeed/base.py in read_csv(self, data_filename, force_reload) 445 parse_dates=self.parse_dates, --> 446 names=self.names, 447 )

~/environments/ml_dir/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision) 701 --> 702 return _read(filepath_or_buffer, kwds) 703

~/environments/ml_dir/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds) 434 try: --> 435 data = parser.read(nrows) 436 finally:

~/environments/ml_dir/lib/python3.6/site-packages/pandas/io/parsers.py in read(self, nrows) 1138 nrows = _validate_integer('nrows', nrows) -> 1139 ret = self._engine.read(nrows) 1140

~/environments/ml_dir/lib/python3.6/site-packages/pandas/io/parsers.py in read(self, nrows) 1994 try: -> 1995 data = self._reader.read(nrows) 1996 except StopIteration:

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_column_data()

ParserError: Too many columns specified: expected 5 and found 1

During handling of the above exception, another exception occurred:

FileNotFoundError Traceback (most recent call last)

in ----> 1 MyDataset.read_csv() ~/Library/Mobile Documents/com~apple~CloudDocs/Y4S1/EE4002R/reinforcement_learning/btgym/btgym/datafeed/base.py in read_csv(self, data_filename, force_reload) 461 msg = 'Data file <{}> not specified / not found / parser error.'.format(str(filename)) 462 self.log.error(msg) --> 463 raise FileNotFoundError(msg) 464 465 self.data = pd.concat(dataframes) FileNotFoundError: Data file not specified / not found / parser error. ` Steps to Reproduce: -------------------- ` from btgym import BTgymEnv, BTgymDataset from btgym.datafeed.derivative import BTgymDataset2, BTgymRandomDataDomain import backtrader.feeds as btfeeds class ExtraPandasDirectData(btfeeds.PandasDirectData): lines = ('width',) params = ( ('width',4), ) datafields = btfeeds.PandasData.datafields + (['width']) class ExtraLinesDataset(BTgymDataset): def to_btfeed(self): """ Overrides default method to add custom datalines. Performs BTgymDataset-->bt.feed conversion. Returns bt.datafeed instance. """ try: assert not self.data.empty btfeed = ExtraPandasDirectData( dataname=self.data, timeframe=self.timeframe, datetime=self.datetime, open=self.open, high=self.high, low=self.low, close=self.close, volume=self.volume, adj_close = 1, adj_close_norm = 2, pred_adj_close = 3, diff = 4, ) btfeed.numrecords = self.data.shape[0] return btfeed except: msg = 'BTgymDataset instance holds no data. Hint: forgot to call .read_csv()?' self.log.error(msg) raise AssertionError(msg) params = dict( # CSV to Pandas params. sep=',', header=0, index_col=0, parse_dates=True, names=['adj_close', 'adj_close_norm', 'pred', 'diff', 'open', 'high', 'low', 'close', 'volume'], # Pandas to BT.feeds params: datetime=0, nullvalue=0.0, timeframe=1, high=6, low=7, open=5, close=8, volume=9, openinterest=-1, adj_close = 1, adj_close_norm = 2, pred_adj_close = 3, diff = 4, # Random-sampling params: start_weekdays=[0, 1, 2, 3, ], # Only weekdays from the list will be used for episode start. start_00=True, # Episode start time will be set to first record of the day (usually 00:00). episode_len_days=1, # Maximum episode time duration in days, hours, minutes: episode_len_hours=23, episode_len_minutes=55, time_gap_days=0, # Maximum data time gap allowed within sample in days, hours. Thereby, time_gap_hours=5, # if set to be < 1 day, samples containing weekends and holidays gaps will be rejected. ) MyDataset = BTgymDataset2( filename='data/RL_data.csv', **params, # leave all other to defaults, ) MyDataset.read_csv() . ## < --- Error starts here env = BTgymEnv(filename=MyDataset, #state_shape={'raw_state': spaces.Box(low=-100, high=100,shape=(30,9))}, state_shape={'raw': spaces.Box(low=-100, high=100,shape=(30,9))}, skip_frame=5, start_cash=100000, broker_commission=0.02, fixed_stake=100, drawdown_call=90, render_ylabel='Price Lines', render_size_episode=(12,8), render_size_human=(8, 3.5), render_size_state=(10, 3.5), render_dpi=75, verbose=0,) ` Do you mind to pinpoint the fix? Thanks!
Kismuz commented 5 years ago

@munkarkin96 , your trace-back log lists pandas loader error:

ParserError: Too many columns specified: expected 5 and found 1

did actual columns, delimiter and date-time format of your file confirm your column parsing specs?

munkarkin96 commented 5 years ago

@Kismuz Thanks for your reply. Unfortunately yes, the specs are correct. Actually supposed to have 9 columns excluding dates. Not sure why the error

expected 5

Kismuz commented 5 years ago

First step to debug it is to feed your data via standalone bt.btfeeds.PandasDirectData instance to check if it can parse it.

Kismuz commented 4 years ago

Closed due to long inactivity period.