PandasData fails to autodetect columns in case the input dataframe uses indexes instead of names for columns

backtrader2 / backtrader

Python Backtesting library for trading strategies

GNU General Public License v3.0

238 stars 54 forks source link

Community discussion:

https://community.backtrader.com/topic/3106/attributeerror-numpy-int64-object-has-no-attribute-lower-when-using-pandas-with-the-noheaders-argument

Suspected Regression Commit:

05051890efe527ee22919d354038b4dfc1ffe7ca: Rework ix -> iloc pull request and autodetection algorithm in PandasData

Description:

PandasData accepts pandas dataframe as input and in some cases tries to auto-detect the columns in this dataframe in order to correctly associate the dataframe column to the data lines. Usually it does it using dataframe column names. However in some cases dataframe may contain no column names but use index values for it instead. In this case PandasData should try to associate the dataframe columns with data line using predefined index mapping.

Apparently this mechanism fails to work after the aforementioned commit.

Test case:

Just run the data-pandas.py --noheaders from the sample directory and see the failure

I'm finding that pandafeed.py lines 211:212 are throwing the error during start phase:

# Transform names (valid for .ix) into indices (good for .iloc)
if self.p.nocase:
    colnames = [x.lower() for x in self.p.dataname.columns.values if isinstance(x, string_types)]
else:
    colnames = [x for x in self.p.dataname.columns.values]

What is happening is that when we are using numerical headers, this area of code is trying to do a 'lower' string operation if self.p.nocase=True, which is the default.

This can be solved by setting self.p.nocase = False thus bypassing this step. The code should detect whether or not we are dealing with strings or numbers so that we won't depend on the user to have this setting to False.

One solution that appears to be working is to set a string qualifier right in the list comprehension:

if self.p.nocase:
    colnames = [x.lower() for x in self.p.dataname.columns.values if isinstance(x, string_types)]

Alternatively we could check if all columns are strings. That said, I'm not sure if there would be an instance where string headers and int headers would be mixed.

The list comprehension modification seems to be working. Any thoughts?

backtrader2 / backtrader

PandasData fails to autodetect columns in case the input dataframe uses indexes instead of names for columns #43