We've made changes to the prediction function. Please read the new documentation
This problem requires a mix of statistics and data analysis skills to create a predictive model using financial data. We will provide you with a toolbox and historical data to develop and test your strategy for the competition.
You need Python 2.7 (Python 3 will be supported later) to run this toolbox. For an easy installation process, we recommend Anaconda since it will reliably install all the necessary dependencies. Download Anaconda and follow the instructions on the installation page. Once you have Python, you can then install the toolbox.
There are multiple ways to install the toolbox for the competition.
The easiest way and the most recommended way is via pip. Just run the following command:
pip install -U auquan_toolbox
If we publish any updates to the toolbox, the same command pip install -U auquan_toolbox
will also automatically get the new version of the toolbox.
Note: Mac users, if you face any issues with installation, try using 'pip install --user auquan_toolbox'
Run the following command to make sure everything is setup properly
python problem1.py
Use problem1.py as a template which contains skeleton functions (with explanation) that need to be filled in to create your own trading strategy. You need to fill in the getFairValue()
function for problem 1.
For problem 2, fill in the getClassifierProbability()
function for problem 2.
The data for the competition is provided here. The toolbox auto-downloads and loads the data for you. You can specify the training dataset you want to load in getTrainingDataSet()
function.
def getTrainingDataSet(self):
return "sampleData"
# Set this to trainingData1 or trainingData2 or trainingData3
You can specify the instruments to load in function getSymbolsToTrade()
. If you return an empty array, it downloads all the stocks.
def getSymbolsToTrade(self):
return []
You then need to create features and combine them in the prediction function to generate your predictions.
Features and predictions are explained below. The toolbox also provides extensive functionality and customization. While not required for the competition,you can read more about the toolbox here
Fill in the features you want to use in getFeatureConfigDicts()
function. Features are called by specifying config dictionaries. Create one dictionary per feature and return them in a dictionary.
Feature config Dictionary has the following keys:
featureId: a string representing the type of feature you want to use
featureKey: {optional} a string representing the key you will use to access the value of this feature
If not present, will just use featureId
params: {optional} A dictionary with which contains other optional params if needed by the feature
Example: If you only want to use the moving_sum feature, your getFeatureConfigDicts()
function should be:
def getFeatureConfigDicts(self):
msDict = {'featureKey': 'ms_5',
'featureId': 'moving_sum',
'params': {'period': 5,
'featureName': 'basis'}}
return [msDict]
You can now use this feature by calling it's featureKey, 'ms_5'
Full list of features with featureId and params is available here.
Custom Features
To use your own custom features, follow the example of class MyCustomFeature()
in problem1.py. Specifically, you'll have to:
create a new class for the feature and implement your logic in the function computeForInstrument()
- you can copy the class from MyCustomFeature()
Example:
class MyCustomFeatureClassName(Feature):
@classmethod
def computeForInstrument(cls, featureParams, featureKey, currentFeatures, instrument, instrumentManager):
return 5
modify function getCustomFeatures()
to return a dictionary with Id for this class (follow formats like {'my_custom_feature_identifier': MyCustomFeatureClassName}
. Make sure 'my_custom_feature_identifier' doesnt conflict with any of the pre defined feature Ids
def getCustomFeatures(self):
return {'my_custom_feature_identifier': MyCustomFeatureClassName}
create a dict for this feature in getFeatureConfigDicts()
. Dict format is:
customFeatureDict = {'featureKey': 'my_custom_feature_key',
'featureId': 'my_custom_feature_identifier',
'params': {'param1': 'value1'}}
You can now use this feature by calling it's featureKey, 'my_custom_feature_key'
Instrument features are calculated per instrument (for example position, fees, moving average of instrument price). The toolbox auto-loops through all intruments to calculate features for you.
Combine all the features to create the desired prediction function. For problem 1, fill the funtion getFairValue()
to return the predicted FairValue(expected average of future values).
Here you can call your previously created features by referencing their featureId. For example, I can call my moving sum and custom feature as:
def getFairValue(self, updateNum, time, instrumentManager):
# holder for all the instrument features
lookbackInstrumentFeatures = instrumentManager.getLookbackInstrumentFeatures()
# dataframe for a historical instrument feature (ms_5 in this case). The index is the timestamps
# atmost upto lookback data points. The columns of this dataframe are the stock symbols/instrumentIds.
ms5Data = lookbackInstrumentFeatures.getFeatureDf('ms_5')
# Returns a series with index as all the instrumentIds. This returns the value of the feature at the last
# time update.
ms5 = ms5Data.iloc[-1]
return ms5
Important: Previously, we were calling lookbackInstrumentFeatures = instrument.getDataDf()
, which returned the holder for all instrument feature and then lookbackInstrumentFeatures['ms_5']
which returns a dataFrame for that feature for one stock. Now we first call the holder for all the instrument features as lookbackInstrumentFeatures = instrumentManager.getLookbackInstrumentFeatures()
and then dataframe for the feature as lookbackInstrumentFeatures.getFeatureDf('ms_5')
which returns a dataFrame for that feature for ALL stocks at the same time. Rest of the code is same.**
Output of the prediction function is used by the toolbox to make further trading decisions and evaluate your score.
Features can be called by specifying config dictionaries. Create one dictionary per feature and return them in a dictionary as market features or instrument features.
Feature config Dictionary has the following keys:
featureId: a string representing the type of feature you want to use
featureKey: {optional} a string representing the key you will use to access the value of this feature
If not present, will just use featureId
params: {optional} A dictionary with which contains other optional params if needed by the feature
Code Snippets for all the features are available here
Feature ID | Parameters | Description |
---|---|---|
moving_average | 'featureName', 'period' | calculate rolling average of featureName over period |
moving_correlation | 'period', 'series1', 'series2' | calculate rolling correlation of series1 and series2 over period |
moving_max | 'featureName', 'period' | calculate rolling max of featureName over period |
moving_min | 'featureName', 'period' | calculate rolling min of featureName over period |
moving_sdev | 'featureName', 'period' | calculate moving standard deviation of featureName over period |
moving_sum | 'featureName', 'period' | calculate moving sum of featureName over period |
exponential_moving_average | 'featureName', 'period' | calculate exp. weighted moving average of featureName with period as half life |
argmax | 'featureName', 'period' | Returns the index where featureName is maximum over period |
argmin | 'featureName', 'period' | Returns the index where featureName is minimum over period |
delay | 'featureName', 'period' | Returns the value of featureName with a delay of period |
difference | 'featureName', 'period' | Returns the difference of featureName with it's value period before |
rank | 'featureName', 'period' | Ranks last period values of featureName on a scale of 0 to 1 |
scale | 'featureName', 'period', 'scale' | Resale last period values of featureName on a scale of 0 to scale |
ratio | 'featureName', 'instrumentId1', 'instrumentId2' | ratio of feature values of instrumentID1 / instrumentID2 |
momentum | 'featureName', 'period' | calculate momentum in featureName over period as (featureValue(now) - featureValue(now - period))/featureValue * 100 |
bollinger_bands | 'featureName', 'period' | DEPRECATED, use bollinger_bands_lower, bollinger_bands_upper as below |
bollinger_bands_lower | 'featureName', 'period' | lower bollinger bands as average(period) - sdev(period) |
bollinger_bands_upper | 'featureName', 'period' | upper bollinger bands as average(period) + sdev(period) |
cross_sectional_momentum | 'featureName', 'period', 'instrumentIds' | Returns Cross-Section Momentum of 'instrumentIds' in featureName over period |
macd | 'featureName', 'period1', 'period2' | moving average convergence divergence as average(period1) - average(period2) |
rsi | 'featureName', 'period' | Relative Strength Index - ratio of average profits / average losses over period |
vwap | - | calculated from book data as bid price x ask volume + ask price x bid volume / (ask volume + bid volume) |
fees | - | fees to trade, always calculated |
position | - | instrument position, always calculated |
pnl | - | Profit/Loss, always calculated |
capital | - | Spare capital not in use, always calculated |
portfolio_value | - | Total value of trading system, always calculated |