TA-Lib / ta-lib-python

Python wrapper for TA-Lib (http://ta-lib.org/).
http://ta-lib.github.io/ta-lib-python
Other
9.79k stars 1.78k forks source link

Abstract TA-Lib functions with Pandas Panel should return a Pandas Panel with TA-Lib values (or Pandas DataFrame) #68

Open femtotrader opened 10 years ago

femtotrader commented 10 years ago

Hello,

import pandas as pd
import pandas.io.data as web
from talib.abstract import *
panel=web.DataReader(["AAPL", "GOOGL"], 'yahoo', "2010-01-01", "2010-01-30")
SMA(panel, timeperiod=4, price='close')

should returns a DataFrame (because SMA returns only one value) with a column per security.

TA-Lib functions which returns several values (such as BBANDS) should returns a panel when a panel is given.

for now it raises

ValueError                                Traceback (most recent call last)
<ipython-input-7-ce3f460ebfe2> in <module>()
      3 from talib.abstract import *
      4 panel=web.DataReader(["AAPL", "GOOGL"], 'yahoo', "2010-01-01", "2010-01-30")
----> 5 SMA(panel, timeperiod=4, price='close')

/Users/femto/.python-eggs/TA_Lib-0.4.8-py2.7-macosx-10.6-x86_64.egg-tmp/talib/abstract.so in talib.abstract.Function.__call__ (talib/abstract.c:6220)()

/Users/femto/.python-eggs/TA_Lib-0.4.8-py2.7-macosx-10.6-x86_64.egg-tmp/talib/abstract.so in talib.abstract.Function.set_function_args (talib/abstract.c:5143)()

/Users/femto/.python-eggs/TA_Lib-0.4.8-py2.7-macosx-10.6-x86_64.egg-tmp/talib/abstract.so in talib.abstract.Function.get_parameters (talib/abstract.c:4363)()

/Users/femto/.python-eggs/TA_Lib-0.4.8-py2.7-macosx-10.6-x86_64.egg-tmp/talib/abstract.so in talib.abstract.Function.__get_opt_input_value (talib/abstract.c:7132)()

/Users/femto/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/generic.pyc in __nonzero__(self)
    690         raise ValueError("The truth value of a {0} is ambiguous. "
    691                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 692                          .format(self.__class__.__name__))
    693
    694     __bool__ = __nonzero__

ValueError: The truth value of a Panel is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Kind regards

femtotrader commented 10 years ago

Here is an sample code to understand (because I may no be clear enough)

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import pandas as pd
import pandas.io.data as web

import talib
from talib.abstract import *

start = "2010-01-01"
end = "2010-01-30"

d_columns_name = {'Open': 'open', 'High': 'high', 'Low': 'low', 'Close': 'close', 'Volume': 'volume', 'Adj Close': 'adj_close'}

lst_symb = ["AAPL", "GOOGL"]
panel = web.DataReader(lst_symb, 'yahoo', start, end)
print(panel)
#Items axis: Open to Adj Close
#Major_axis axis: 2010-01-04 00:00:00 to 2010-01-29 00:00:00
#Minor_axis axis: AAPL to GOOGL
panel = panel.transpose(2, 1, 0) # symbol is panel items
panel = panel.rename(minor_axis=d_columns_name)
#Items axis: AAPL to GOOGL
#Major_axis axis: 2010-01-04 00:00:00 to 2010-01-29 00:00:00
#Minor_axis axis: Open to Adj Close
print(panel)
#print(panel["AAPL"])

# Sample of a TA-Lib function which return ONE value
df_results = pd.DataFrame(index=panel.major_axis, columns=panel.items)
for s in lst_symb:
    ts = SMA(panel[s], timeperiod=4)
    df_results[s] = ts
print(df_results)

# Sample of a TA-Lib function which return SEVERAL values
talib_func = BBANDS
panel_results = pd.Panel(items=panel.items, major_axis=panel.major_axis, minor_axis=talib_func.output_names)
for s in lst_symb:
    df = talib_func(panel[s])
    df_results[s] = ts
    panel_results[s] = df
print(panel_results)
print(panel_results["AAPL"])
panel = panel.transpose(2, 1, 0) # transpose again!
print(panel)
mrjbq7 commented 10 years ago

Ahh, seems like a neat idea. We have a few "pandas" integration ideas like this that would make things easier. I haven't worked with Panels much before so let me see how easy it might be to use. Also, we provided some flexibility to subclass talib.abstract.Function, so you could modify the inputs and outputs methods to take and return your type of object if you wanted to work on this some before I get to it. The from talib.abstract import SMA is the same as SMA = Function("SMA").

femtotrader commented 10 years ago

You will have to take care of dimensions order. pandas.io.data.DataReader returns OHLCV data as Panel when a list of symbols is given.

Items axis: Open to Adj Close
Major_axis axis: 2010-01-04 00:00:00 to 2010-01-29 00:00:00
Minor_axis axis: AAPL to GOOGL

so columns of OHLCV dataframe ('Open', 'High', 'Low', 'Close') is items (first dimension or dimension 0) of panel symbol name is minor axis of Panel (second dimension or dimension 1) datetime is major axis of Pane - third dimension or dimension 2 l (or row/index of DataFrame)

So we can get Open price for AAPL at 2010-01-04 using:

panel['Open']['AAPL']["2010-01-04"]

I think that order should be a "setup" of TA-Lib (like column names ... see https://github.com/mrjbq7/ta-lib/issues/66 ) Something like this could be useful

In TA-Lib wrapper code define a default order

talib.dimensions_order.DEFAULT = ['ohlcv', 'symbol', 'datetime']

On user side

talib.dimensions_order = talib.dimensions_order.DEFAULT 

or

talib.dimensions_order = ['symbol', 'ohlcv', 'datetime']

On TA-Lib wrapper code side you can build a dict from this list

{val: key for key, val in enumerate(dimensions_order)}

so we will have

{'datetime': 2, 'ohlcv': 0, 'symbol': 1}

So after setting up TA-Lib wrapper we can give Pandas DataFrame or Panel very simply to a TA-Lib function (without renaming columns of DataFrame before passing it), or without transposing dimension order of a Panel.

You will note that DataReader default panel output dimension order is not very convenient... that's why I need to transpose before applying SMA or BBANDS

Thanks for the subclassing tip.

femtotrader commented 10 years ago

Without transposing it's also possible to get a DataFrame using a Panel

In [53]: panel.loc[:,:,"AAPL"]
Out[53]:
              Open    High     Low   Close     Volume  Adj Close
Date
2010-01-04  213.43  214.50  212.38  214.01  123432400      29.08
2010-01-05  214.60  215.59  213.25  214.38  150476200      29.13
2010-01-06  214.38  215.23  210.75  210.97  138040000      28.66
2010-01-07  211.75  212.00  209.05  210.58  119282800      28.61
2010-01-08  210.30  212.00  209.06  211.98  111902700      28.80
2010-01-11  212.80  213.00  208.45  210.11  115557400      28.55
2010-01-12  209.19  209.77  206.42  207.72  148614900      28.22
2010-01-13  207.87  210.93  204.10  210.65  151473000      28.62
2010-01-14  210.11  210.46  209.02  209.43  108223500      28.46
2010-01-15  210.93  211.60  205.87  205.93  148516900      27.98
2010-01-19  208.33  215.19  207.24  215.04  182501900      29.22
2010-01-20  214.91  215.55  209.50  211.73  153038200      28.77
2010-01-21  212.08  213.31  207.21  208.07  152038600      28.27
2010-01-22  206.78  207.50  197.16  197.75  220441900      26.87
2010-01-25  202.51  204.70  200.19  203.07  266424900      27.59
2010-01-26  205.95  213.71  202.58  205.94  466777500      27.98
2010-01-27  206.85  210.58  199.53  207.88  430642100      28.24
2010-01-28  204.93  205.50  198.70  199.29  293375600      27.08
2010-01-29  201.08  202.20  190.25  192.06  311488100      26.10

but

In [54]: panel.loc[:,:,["AAPL"]]
Out[54]:
<class 'pandas.core.panel.Panel'>
Dimensions: 6 (items) x 19 (major_axis) x 1 (minor_axis)
Items axis: Open to Adj Close
Major_axis axis: 2010-01-04 00:00:00 to 2010-01-29 00:00:00
Minor_axis axis: AAPL to AAPL

returns a "sub" panel

femtotrader commented 10 years ago

So it's possible to do the same without transposing

start = "2010-01-01"
end = "2010-01-30"

d_columns_name = {'Open': 'open', 'High': 'high', 'Low': 'low', 'Close': 'close', 'Volume': 'volume', 'Adj Close': 'adj_close'}

lst_symb = ["AAPL", "GOOGL"]
panel = web.DataReader(lst_symb, 'yahoo', start, end)
print(panel)
#Items axis: Open to Adj Close
#Major_axis axis: 2010-01-04 00:00:00 to 2010-01-29 00:00:00
#Minor_axis axis: AAPL to GOOGL
panel = panel.rename(items=d_columns_name)
#data = SMA(panel)
#Items axis: AAPL to GOOGL
#Major_axis axis: 2010-01-04 00:00:00 to 2010-01-29 00:00:00
#Minor_axis axis: Open to Adj Close
print(panel)
print(panel.loc[:,:,"AAPL"])
#data = SMA(panel)

# Sample of a TA-Lib function which return ONE value
df_results = pd.DataFrame(index=panel.major_axis, columns=panel.minor_axis)

for s in lst_symb:
    ts = SMA(panel.loc[:,:,s], timeperiod=4)
    df_results[s] = ts
print(df_results)

# Sample of a TA-Lib function which return SEVERAL values
talib_func = BBANDS
panel_results = pd.Panel(items=panel.items, major_axis=panel.major_axis, minor_axis=talib_func.output_names)
for s in lst_symb:
    df = talib_func(panel.loc[:,:,s])
    df_results[s] = ts
    panel_results[s] = df
print(panel_results)
print(panel_results["AAPL"])
aking1012 commented 9 years ago

Here's how I handled it. I think it's elegant:

def __init__(self):
    self.func_dict = {}
    for item in dir(talib):
        if item[0:3] == 'CDL':
            self.func_dict[item] = eval('talib.'+item)

def compute_candle(self, pattern, df):
    npa_result = self.func_dict[pattern](open = df['Open'].values,
                  high = df['High'].values,
                  low = df['Low'].values,
                  close = df['Close'].values)
    df[pattern] = npa_result.tolist()
    return df

I know it's not applying to SMA or anything but candles... but I'm wrapping most of talib this way to make a pandas friendly idiot proof layer. So, if you don't want to bother with it - it might just be worth waiting.

aking1012 commented 9 years ago

Does anyone have any thoughts on implementing a "just pandas convenience" layer something like this: https://github.com/aking1012/pandastalib/blob/master/PandasTALib.py instead of trying to make the existing library service numpy, pandas, et al and winding up with bugs like not accepting / returning series or frames from pandas and something like this bug? If the answer is "No. That's a bad idea." It would take me off this thread, but it seems like it could be a solution to multiple bugs.

mrjbq7 commented 9 years ago

I'd be happy to have pandas support builtin. It used to take pandas series before they changed Series to not subclass ndarray. But it always used to produce a ndarray, I believe.

I need some guidance, though.

Looking at the Function interface, should it stay numpy-only, or be adapted to support pandas? Should it produce pandas.Series output if the input is pandas.Series? What index should it us on the output? Should it check all the inputs to be the same index? Should it take an index as a separate argument?

Looking at the Abstract interface, we support calling with a pandas.DataFrame and pandas.Series, but we still return arguments as ndarrays...

What are you looking for?

(Also its a shame to wrap each one the way you're doing, some kind of meta-programming would be a lot better where you just loop over all the functions and generate those with a small amount of code).

aking1012 commented 9 years ago

Responding to each part separately:

(First and last) I completely agree, but I don't know how to do the meta part. If someone who isn't me wants to show me how to do it or point me at something that solves a similar problem that way which I can understand, I would. I thought about implementing it as a script to parse the docstring and get ins-outs-etc and just wrap everything magically. It's also easier to write something that performs magic after you perform the menial tasks, separate everything, and figure out the thought train - then go back and automate.

We both do, that's why I asked.

I would leave the function interface the way it is and decouple pandas support and that piece of code. It reads well for what it is. It just doesn't read necessarily as well as an end product to interface with pandas.

I haven't looked at the abstract interface at all. I'ld rather take smaller bites.

The looking for bit is finding out what everyone else is looking for partially - take away the learning curve altogether. That way people just try to use it, and it works, something like:

df = pandas.io.web(['GOOG'], 'yahoo') df = talib.PandasCompat._any_talibfunction[df, [optional, non-default, arguments, here]]

and it would ideally return a wider frame with the desired data for me. I need to know the use case for others to make it as good as I can make it though.

mrjbq7 commented 9 years ago

This works right now:

import datetime
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2013, 1, 27)

import pandas.io.data as web
df = web.DataReader('GOOG', 'yahoo', start, end)

# fix column names, this could be handled better by talib
df.columns = [s.lower() for s in df.columns]

from talib.abstract import RSI
df['RSI'] = RSI(df, timeperiod=5)
aking1012 commented 9 years ago

I may need to read the abstract api to see what parts might not be working. I just thought it should just work at the function level before I progressed to the abstract level.

mrjbq7 commented 9 years ago

The abstract API was contributed by someone that wanted a little more flexibility aside from the lightweight wrapper from the C functions. It might be of some interest to you.

femtotrader commented 9 years ago

Hi,

as you are using DataReader for tests you might be interested by my project: http://pandas-datareaders-unofficial.readthedocs.org/en/latest/ it performs HTTP requests using requests and adds cache mechanism using requests-cache

aking1012 commented 9 years ago

@femtotrader - that will be useful when I get all the other parts built... and yes, I am using DataReader for fetches, but I'm also using a local cache of "all historical data I can get" to precompute literally everything.

aking1012 commented 9 years ago

@mrjbq7 - you're right, the abstract api does LOADS of what I would have needed to do this more succinctly. I think I'm going to solve it with a combination of that approach and my own.

aking1012 commented 9 years ago

For the abstract api, something like this with some added if type sorts of things and casting would solve the problem.