anfederico / clairvoyant

Software designed to identify and monitor social/historical cues for short term stock movement
MIT License
2.42k stars 772 forks source link

TypeError: can't multiply sequence by non-int of type 'float' #4

Closed vinodpkd closed 8 years ago

vinodpkd commented 8 years ago

from clairvoyant import Backtest from pandas import read_csv

Testing performance on a single stock

variables = ["SSO", "SSC"] # Financial indicators of choice trainStart = '2013-03-01' # Start of training period trainEnd = '2015-07-15' # End of training period testStart = '2015-07-16' # Start of testing period testEnd = '2016-07-16' # End of training period buyThreshold = 0.65 # Confidence threshold for predicting buy (default = 0.65) sellThreshold = 0.65 # Confidence threshold for predicting sell (default = 0.65) C = 1 # Penalty parameter (default = 1) gamma = 10 # Kernel coefficient (default = 10) continuedTraining = False # Continue training during testing period? (default = false)

backtest = Backtest(variables, trainStart, trainEnd, testStart, testEnd)

data = read_csv(r"H:/Python/ClairVoyantTest/sbux.csv") # Read in data data = data.round(3) # Round all values
backtest.stocks.append("SBUX") # Inform the model which stock is being tested for i in range(0,10): # Run the model 10-15 times
backtest.runModel(data)

backtest.displayConditions() backtest.displayStats()


I have run the above code: The issue coming was

File "", line 1, in runfile('H:/Python/StockPredictionUsingClairVoyant.py', wdir='H:/Python')

File "C:\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 685, in runfile execfile(filename, namespace)

File "C:\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 71, in execfile exec(compile(scripttext, filename, 'exec'), glob, loc)

File "H:/Python/StockPredictionUsingClairVoyant.py", line 30, in backtest.runModel(data)

File "C:\Anaconda2\lib\site-packages\clairvoyant\Backtest.py", line 72, in runModel data['Date'] = to_datetime(data['Date'])

File "C:\Anaconda2\lib\site-packages\pandas\core\frame.py", line 1914, in getitem return self._getitem_column(key)

File "C:\Anaconda2\lib\site-packages\pandas\core\frame.py", line 1921, in _getitem_column return self._get_item_cache(key)

File "C:\Anaconda2\lib\site-packages\pandas\core\generic.py", line 1090, in _get_item_cache values = self._data.get(item)

File "C:\Anaconda2\lib\site-packages\pandas\core\internals.py", line 3102, in get loc = self.items.get_loc(item)

File "C:\Anaconda2\lib\site-packages\pandas\core\index.py", line 1692, in get_loc return self._engine.get_loc(_values_from_object(key))

File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:3979)

File "pandas\index.pyx", line 157, in pandas.index.IndexEngine.get_loc (pandas\index.c:3843)

File "pandas\hashtable.pyx", line 668, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12265)

File "pandas\hashtable.pyx", line 676, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12216)

KeyError: 'Date'

runfile('H:/Python/StockPredictionUsingClairVoyant.py', wdir='H:/Python') Traceback (most recent call last):

File "", line 1, in runfile('H:/Python/StockPredictionUsingClairVoyant.py', wdir='H:/Python')

File "C:\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 685, in runfile execfile(filename, namespace)

File "C:\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 71, in execfile exec(compile(scripttext, filename, 'exec'), glob, loc)

File "H:/Python/StockPredictionUsingClairVoyant.py", line 27, in data = data.round(3) # Round all values

File "C:\Anaconda2\lib\site-packages\pandas\core\frame.py", line 4335, in round new_cols = [np.round(self[col], decimals) for col in self]

File "C:\Anaconda2\lib\site-packages\numpy\core\fromnumeric.py", line 2782, in round_ return round(decimals, out)

File "C:\Anaconda2\lib\site-packages\pandas\core\series.py", line 1234, in round result = _values_from_object(self).round(decimals, out=out)

TypeError: can't multiply sequence by non-int of type 'float'

What would be the cause of the error.

anfederico commented 8 years ago

Can I see the first 5 lines of your data file?

vinodpkd commented 8 years ago

Can I see the first 5 lines of your data file?

Here is the data format:

Date,Open,High,Low,Close,Volume 28-Oct-16,53.65,53.84,53.11,53.53,6620333 27-Oct-16,53.60,53.83,53.13,53.59,7899957 26-Oct-16,53.60,53.84,53.36,53.63,5817798 25-Oct-16,54.10,54.17,53.50,53.67,6052830 24-Oct-16,53.90,54.46,53.89,54.18,6919714

anfederico commented 8 years ago

One thing that may be an issue is your dates should be in the opposite order. Also, I don't see the variables that you listed in the header of your csv file.

variables = ["SSO", "SSC"] # If you set "SSO" and "SSC", they should be in your header

Try to format your data so it looks like this

Date,Open,High,Low,Close,Volume,SSO,SCC 03/01/2013,27.72,27.98,27.52,27.95,34851872,65.7894736842,-0.121 03/04/2013,27.85,28.15,27.7,28.15,38167504,75.9450171821,0.832 03/05/2013,28.29,28.54,28.16,28.35,41437136,84.9230769231,0.151 03/06/2013,28.21,28.23,27.78,28.09,51448912,80.7799442897,-0.689 03/07/2013,28.11,28.28,28.005,28.14,29197632,73.5368956743,-0.821

vinodpkd commented 8 years ago

Can I make variables = [] instead of variables = ["SSO", "SSC"] I don't understand SSO and SSC. I downloaded the data from google finance.

vinodpkd commented 8 years ago

data.iloc[0] Out[20]: Date 01/02/2013 Open 27.3 High 27.5 Low 27.13 Close 27.5 Volume 13268930 Name: 0, dtype: object

data.iloc[0].round(3) Traceback (most recent call last):

File "", line 1, in data.iloc[0].round(3)

File "C:\Anaconda2\lib\site-packages\pandas\core\series.py", line 1234, in round result = _values_from_object(self).round(decimals, out=out)

TypeError: can't multiply sequence by non-int of type 'float'

data = data.round(3) # Round all values is causing the error. Might it cannot round the date?

uclatommy commented 8 years ago

vinodpkd, SSO and SSC are custom-designed indicators that the author is using to train his model. At the end of his readme he provides a link to another project that is producing these values. Looks like he's mining social media to provide a social sentiment score.

If you don't understand the values, I think you can use any other data to train your model. For example, p/e ratios if you're more familiar with that indicator. Basically this training data is essential for the model to "learn" how to indicate a buy or sell. It uses a probabilistic classifier to over a training set in order to develop an association between the indicators and a buy or sell recommendation. So I think before you can continue any further, you have to provide at least some data as the indicator.

anfederico commented 8 years ago

Exactly, for example I see your dataset contains the header: "Date,Open,High,Low,Close,Volume"

You could then do something like variables = ["High", "Low"] and the program will attempt to learn when to buy and when to sell based on the High and Low values of your data. Note that this is probably not going to work because the raw High/Low values alone are not very predictive of how the stock price will move. Therefore, you should try different indicators, like P/E for example.

"Date,Open,High,Low,Close,Volume,P/E"

Hope this helps!