clinestanford / USUInvest18-19

2 stars 0 forks source link

Find all Pairs #1

Open Bryson14 opened 5 years ago

Bryson14 commented 5 years ago

`

@Author Bryson Meiling

input is two tickers. The output is a correlation factor between -1 and 1

makes the two list the same length for the correlation calculation between -1 and 1

if one is longer, it will cut off the oldest data of the longer set

from statsmodels.tsa.stattools import coint import math

def pearson_coor(data1: list, data2: list) -> float: # find the correlation of 2 list of even length

x, y = data1, data2
assert len(x) == len(y)
n = len(x)
assert n > 0
avg_x = average(x)
avg_y = average(y)
diffprod = 0
xdiff2 = 0
ydiff2 = 0
for idx in range(n):
    xdiff = x[idx] - avg_x
    ydiff = y[idx] - avg_y
    diffprod += xdiff * ydiff
    xdiff2 += xdiff * xdiff
    ydiff2 += ydiff * ydiff
return diffprod / math.sqrt(xdiff2 * ydiff2)

def average(x: list) -> float: assert len(x) > 0 return float(sum(x)) / len(x)

def coint_test(ticker1: list, ticker2: list) -> float: return coint(ticker1, ticker2)

def ticker_list(file) -> list: with open(file, 'r') as fil: data = fil.readlines() for i in range(len(data)): data[i] = data[i].strip() return data

warning this function will take around 40 - 60 minutes to complete if comparing ~500 securities

def find_all_pairs(days: int, data_dir: str): symbols = ticker_list()

symbols = symbols[:10] # testing not 124000 possibilities

correlated = []
cointegrated = []
for i in range(len(symbols)):
    for j in range(i + 1, len(symbols)):
        if .95 < pearson_coor(Stock(symbols[i]).ranged_data_list('close', days),
                        Stock(symbols[j]).ranged_data_list('close', days)):
            correlated.append([symbols[i], symbols[j]])

for i in range(len(correlated)):
    pvalue = coint(Stock(correlated[i][0]).ranged_data_list('close', days),
             Stock(correlated[i][1]).ranged_data_list('close', days))[1]

    if pvalue < .005:
        cointegrated.append([pvalue, correlated[i][0],correlated[i][1]])

return cointegrated

`

in function find_all_pairs, the only thing that needs to be changed is the way it accesses the data. I was using a class named Stock and that could return the last n days of close or open prices in a list.

` MMM ABT ABBV ABMD ACN ATVI ADBE AMD AAP AES AET AMG AFL A APD AKAM ALK ALB ARE ALXN ALGN ALLE AGN ADS LNT ALL GOOGL GOOG MO AMZN AEE AAL AEP AXP AIG AMT AWK AMP ABC AME AMGN APH APC ADI ANSS ANTM AON AOS APA AIV AAPL AMAT APTV ADM ARNC ANET AJG AIZ T ADSK ADP AZO AVB AVY BHGE BLL BAC BK BAX BBT BDX BRK.B BBY BIIB BLK HRB BA BKNG BWA BXP BSX BHF BMY AVGO BR BF.B CHRW COG CDNS CPB COF CAH KMX CCL CAT CBOE CBRE CBS CELG CNC CNP CTL CERN CF SCHW CHTR CVX CMG CB CHD CI XEC CINF CTAS CSCO C CFG CTXS CLX CME CMS KO CTSH CL CMCSA CMA CAG CXO COP ED STZ COO CPRT GLW COST COTY CCI CSX CMI CVS DHI DHR DRI DVA DE DAL XRAY DVN DLR DFS DISCA DISCK DISH DG DLTR D DOV DWDP DTE DRE DUK DXC ETFC EMN ETN EBAY ECL EIX EW EA EMR ETR EOG EFX EQIX EQR ESS EL EVRG ES RE EXC EXPE EXPD ESRX EXR XOM FFIV FB FAST FRT FDX FIS FITB FE FISV FLT FLIR FLS FLR FMC FL F FTNT FTV FBHS BEN FCX GPS GRMN IT GD GE GIS GM GPC GILD GPN GS GT GWW HAL HBI HOG HRS HIG HAS HCA HCP HP HSIC HSY HES HPE HLT HFC HOLX HD HON HRL HST HPQ HUM HBAN HII IDXX INFO ITW ILMN IR INTC ICE IBM INCY IP IPG IFF INTU

ISRG IVZ IPGP IQV IRM JKHY JEC JBHT JEF SJM JNJ JCI JPM JNPR KSU K KEY KEYS KMB KIM KMI KLAC KSS KHC KR LB LLL LH LRCX LEG LEN LLY LNC LIN LKQ LMT L LOW LYB MTB MAC M MRO MPC MAR MMC MLM MAS MA MAT MKC MCD MCK MDT MRK MET MTD MGM KORS MCHP MU MSFT MAA MHK TAP MDLZ MNST MCO MS MOS MSI MSCI MYL NDAQ NOV NKTR NTAP NFLX NWL NFX NEM NWSA NWS NEE NLSN NKE NI NBL JWN NSC NTRS NOC NCLH NRG NUE NVDA ORLY OXY OMC OKE ORCL PCAR PKG PH PAYX PYPL PNR PBCT PEP PKI PRGO PFE PCG PM PSX PNW PXD PNC RL PPG PPL PFG PG PGR PLD PRU PEG PSA PHM PVH QRVO PWR QCOM DGX RJF RTN O RHT REG REGN RF RSG RMD RHI ROK COL ROL ROP ROST RCL CRM SBAC SCG SLB STX SEE SRE SHW SPG SWKS SLG SNA SO LUV SPGI SWK SBUX STT SRCL SYK STI SIVB SYMC SYF SNPS SYY TROW TTWO TPR TGT TEL FTI TXN TXT TMO TIF TWTR TJX TMK TSS TSCO TDG TRV TRIP FOXA FOX TSN UDR ULTA USB UAA UA UNP UAL UNH UPS URI UTX UHS UNM VFC VLO VAR VTR VRSN VRSK VZ VRTX VIAB V VNO VMC WMT WBA DIS WM WAT WEC WCG WFC WELL WDC WU WRK WY WHR WMB WLTW WYNN XEL XRX XLNX XYL YUM ZBH ZION ZTS `

This is the file where it pulls the names of the stocks to compare THanks! Lmk how it works

JAMMFam commented 5 years ago

Test