Open femtotrader opened 10 years ago
Here is an sample code to understand (because I may no be clear enough)
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import pandas as pd
import pandas.io.data as web
import talib
from talib.abstract import *
start = "2010-01-01"
end = "2010-01-30"
d_columns_name = {'Open': 'open', 'High': 'high', 'Low': 'low', 'Close': 'close', 'Volume': 'volume', 'Adj Close': 'adj_close'}
lst_symb = ["AAPL", "GOOGL"]
panel = web.DataReader(lst_symb, 'yahoo', start, end)
print(panel)
#Items axis: Open to Adj Close
#Major_axis axis: 2010-01-04 00:00:00 to 2010-01-29 00:00:00
#Minor_axis axis: AAPL to GOOGL
panel = panel.transpose(2, 1, 0) # symbol is panel items
panel = panel.rename(minor_axis=d_columns_name)
#Items axis: AAPL to GOOGL
#Major_axis axis: 2010-01-04 00:00:00 to 2010-01-29 00:00:00
#Minor_axis axis: Open to Adj Close
print(panel)
#print(panel["AAPL"])
# Sample of a TA-Lib function which return ONE value
df_results = pd.DataFrame(index=panel.major_axis, columns=panel.items)
for s in lst_symb:
ts = SMA(panel[s], timeperiod=4)
df_results[s] = ts
print(df_results)
# Sample of a TA-Lib function which return SEVERAL values
talib_func = BBANDS
panel_results = pd.Panel(items=panel.items, major_axis=panel.major_axis, minor_axis=talib_func.output_names)
for s in lst_symb:
df = talib_func(panel[s])
df_results[s] = ts
panel_results[s] = df
print(panel_results)
print(panel_results["AAPL"])
panel = panel.transpose(2, 1, 0) # transpose again!
print(panel)
Ahh, seems like a neat idea. We have a few "pandas" integration ideas like this that would make things easier. I haven't worked with Panels much before so let me see how easy it might be to use. Also, we provided some flexibility to subclass talib.abstract.Function
, so you could modify the inputs and outputs methods to take and return your type of object if you wanted to work on this some before I get to it. The from talib.abstract import SMA
is the same as SMA = Function("SMA")
.
You will have to take care of dimensions order.
pandas.io.data.DataReader
returns OHLCV data as Panel when a list of symbols is given.
Items axis: Open to Adj Close
Major_axis axis: 2010-01-04 00:00:00 to 2010-01-29 00:00:00
Minor_axis axis: AAPL to GOOGL
so columns of OHLCV dataframe ('Open', 'High', 'Low', 'Close') is items (first dimension or dimension 0) of panel symbol name is minor axis of Panel (second dimension or dimension 1) datetime is major axis of Pane - third dimension or dimension 2 l (or row/index of DataFrame)
So we can get Open price for AAPL at 2010-01-04 using:
panel['Open']['AAPL']["2010-01-04"]
I think that order should be a "setup" of TA-Lib (like column names ... see https://github.com/mrjbq7/ta-lib/issues/66 ) Something like this could be useful
In TA-Lib wrapper code define a default order
talib.dimensions_order.DEFAULT = ['ohlcv', 'symbol', 'datetime']
On user side
talib.dimensions_order = talib.dimensions_order.DEFAULT
or
talib.dimensions_order = ['symbol', 'ohlcv', 'datetime']
On TA-Lib wrapper code side you can build a dict from this list
{val: key for key, val in enumerate(dimensions_order)}
so we will have
{'datetime': 2, 'ohlcv': 0, 'symbol': 1}
So after setting up TA-Lib wrapper we can give Pandas DataFrame or Panel very simply to a TA-Lib function (without renaming columns of DataFrame before passing it), or without transposing dimension order of a Panel.
You will note that DataReader default panel output dimension order is not very convenient... that's why I need to transpose before applying SMA or BBANDS
Thanks for the subclassing tip.
Without transposing it's also possible to get a DataFrame using a Panel
In [53]: panel.loc[:,:,"AAPL"]
Out[53]:
Open High Low Close Volume Adj Close
Date
2010-01-04 213.43 214.50 212.38 214.01 123432400 29.08
2010-01-05 214.60 215.59 213.25 214.38 150476200 29.13
2010-01-06 214.38 215.23 210.75 210.97 138040000 28.66
2010-01-07 211.75 212.00 209.05 210.58 119282800 28.61
2010-01-08 210.30 212.00 209.06 211.98 111902700 28.80
2010-01-11 212.80 213.00 208.45 210.11 115557400 28.55
2010-01-12 209.19 209.77 206.42 207.72 148614900 28.22
2010-01-13 207.87 210.93 204.10 210.65 151473000 28.62
2010-01-14 210.11 210.46 209.02 209.43 108223500 28.46
2010-01-15 210.93 211.60 205.87 205.93 148516900 27.98
2010-01-19 208.33 215.19 207.24 215.04 182501900 29.22
2010-01-20 214.91 215.55 209.50 211.73 153038200 28.77
2010-01-21 212.08 213.31 207.21 208.07 152038600 28.27
2010-01-22 206.78 207.50 197.16 197.75 220441900 26.87
2010-01-25 202.51 204.70 200.19 203.07 266424900 27.59
2010-01-26 205.95 213.71 202.58 205.94 466777500 27.98
2010-01-27 206.85 210.58 199.53 207.88 430642100 28.24
2010-01-28 204.93 205.50 198.70 199.29 293375600 27.08
2010-01-29 201.08 202.20 190.25 192.06 311488100 26.10
but
In [54]: panel.loc[:,:,["AAPL"]]
Out[54]:
<class 'pandas.core.panel.Panel'>
Dimensions: 6 (items) x 19 (major_axis) x 1 (minor_axis)
Items axis: Open to Adj Close
Major_axis axis: 2010-01-04 00:00:00 to 2010-01-29 00:00:00
Minor_axis axis: AAPL to AAPL
returns a "sub" panel
So it's possible to do the same without transposing
start = "2010-01-01"
end = "2010-01-30"
d_columns_name = {'Open': 'open', 'High': 'high', 'Low': 'low', 'Close': 'close', 'Volume': 'volume', 'Adj Close': 'adj_close'}
lst_symb = ["AAPL", "GOOGL"]
panel = web.DataReader(lst_symb, 'yahoo', start, end)
print(panel)
#Items axis: Open to Adj Close
#Major_axis axis: 2010-01-04 00:00:00 to 2010-01-29 00:00:00
#Minor_axis axis: AAPL to GOOGL
panel = panel.rename(items=d_columns_name)
#data = SMA(panel)
#Items axis: AAPL to GOOGL
#Major_axis axis: 2010-01-04 00:00:00 to 2010-01-29 00:00:00
#Minor_axis axis: Open to Adj Close
print(panel)
print(panel.loc[:,:,"AAPL"])
#data = SMA(panel)
# Sample of a TA-Lib function which return ONE value
df_results = pd.DataFrame(index=panel.major_axis, columns=panel.minor_axis)
for s in lst_symb:
ts = SMA(panel.loc[:,:,s], timeperiod=4)
df_results[s] = ts
print(df_results)
# Sample of a TA-Lib function which return SEVERAL values
talib_func = BBANDS
panel_results = pd.Panel(items=panel.items, major_axis=panel.major_axis, minor_axis=talib_func.output_names)
for s in lst_symb:
df = talib_func(panel.loc[:,:,s])
df_results[s] = ts
panel_results[s] = df
print(panel_results)
print(panel_results["AAPL"])
Here's how I handled it. I think it's elegant:
def __init__(self):
self.func_dict = {}
for item in dir(talib):
if item[0:3] == 'CDL':
self.func_dict[item] = eval('talib.'+item)
def compute_candle(self, pattern, df):
npa_result = self.func_dict[pattern](open = df['Open'].values,
high = df['High'].values,
low = df['Low'].values,
close = df['Close'].values)
df[pattern] = npa_result.tolist()
return df
I know it's not applying to SMA or anything but candles... but I'm wrapping most of talib this way to make a pandas friendly idiot proof layer. So, if you don't want to bother with it - it might just be worth waiting.
Does anyone have any thoughts on implementing a "just pandas convenience" layer something like this: https://github.com/aking1012/pandastalib/blob/master/PandasTALib.py instead of trying to make the existing library service numpy, pandas, et al and winding up with bugs like not accepting / returning series or frames from pandas and something like this bug? If the answer is "No. That's a bad idea." It would take me off this thread, but it seems like it could be a solution to multiple bugs.
I'd be happy to have pandas support builtin. It used to take pandas series before they changed Series to not subclass ndarray. But it always used to produce a ndarray, I believe.
I need some guidance, though.
Looking at the Function interface, should it stay numpy-only, or be adapted to support pandas? Should it produce pandas.Series output if the input is pandas.Series? What index should it us on the output? Should it check all the inputs to be the same index? Should it take an index as a separate argument?
Looking at the Abstract interface, we support calling with a pandas.DataFrame and pandas.Series, but we still return arguments as ndarrays...
What are you looking for?
(Also its a shame to wrap each one the way you're doing, some kind of meta-programming would be a lot better where you just loop over all the functions and generate those with a small amount of code).
Responding to each part separately:
(First and last) I completely agree, but I don't know how to do the meta part. If someone who isn't me wants to show me how to do it or point me at something that solves a similar problem that way which I can understand, I would. I thought about implementing it as a script to parse the docstring and get ins-outs-etc and just wrap everything magically. It's also easier to write something that performs magic after you perform the menial tasks, separate everything, and figure out the thought train - then go back and automate.
We both do, that's why I asked.
I would leave the function interface the way it is and decouple pandas support and that piece of code. It reads well for what it is. It just doesn't read necessarily as well as an end product to interface with pandas.
I haven't looked at the abstract interface at all. I'ld rather take smaller bites.
The looking for bit is finding out what everyone else is looking for partially - take away the learning curve altogether. That way people just try to use it, and it works, something like:
df = pandas.io.web(['GOOG'], 'yahoo') df = talib.PandasCompat._any_talibfunction[df, [optional, non-default, arguments, here]]
and it would ideally return a wider frame with the desired data for me. I need to know the use case for others to make it as good as I can make it though.
This works right now:
import datetime
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2013, 1, 27)
import pandas.io.data as web
df = web.DataReader('GOOG', 'yahoo', start, end)
# fix column names, this could be handled better by talib
df.columns = [s.lower() for s in df.columns]
from talib.abstract import RSI
df['RSI'] = RSI(df, timeperiod=5)
I may need to read the abstract api to see what parts might not be working. I just thought it should just work at the function level before I progressed to the abstract level.
The abstract API was contributed by someone that wanted a little more flexibility aside from the lightweight wrapper from the C functions. It might be of some interest to you.
Hi,
as you are using DataReader for tests you might be interested by my project: http://pandas-datareaders-unofficial.readthedocs.org/en/latest/ it performs HTTP requests using requests and adds cache mechanism using requests-cache
@femtotrader - that will be useful when I get all the other parts built... and yes, I am using DataReader for fetches, but I'm also using a local cache of "all historical data I can get" to precompute literally everything.
@mrjbq7 - you're right, the abstract api does LOADS of what I would have needed to do this more succinctly. I think I'm going to solve it with a combination of that approach and my own.
Hello,
should returns a DataFrame (because SMA returns only one value) with a column per security.
TA-Lib functions which returns several values (such as BBANDS) should returns a panel when a panel is given.
for now it raises
Kind regards