abdullahkarasan / mlfrm

Other
71 stars 47 forks source link

Chapter 7 - Ticker selection and window calculations #5

Open mshearer0 opened 2 years ago

mshearer0 commented 2 years ago
  1. The rolling_five calculation seems to loop over the first rows of INTC values for each of the TICKERs regardless of the value of j. Should the subset of liq_data matching the TICKER value be selected as per:

rolling_five.append(liq_data[liq_data.TICKER == j][i:i+5].agg({'BIDLO': 'min', 'ASKHI': 'max', 'VOL': 'sum', 'SHROUT': 'mean', 'PRC': 'mean'}))

This same bug seems to be present in subsequent sections as well.

  1. The last 4 values of the rolling window have 4,3,2 and 1 rows to calculate over, such that the last row is simply the values of BIDLO,ASKHI, VOL, SHROUT & PRC. If the modification above is included this occurs at the end of each set of rows per TICKERs as opposed to just once at the end of the file.

  2. For the liq_ratio calculation the numerator is a sum of 5 sets of 'price X volume' calculations whereas the denominator is a single difference of means. As such the numerator-denominator ratio comes out 5 times greater than perhaps it should. Alternatively means could be used in the numerator as well as:

liq_ratio.append((liq_vol_all[liq_data.TICKER == j]['PRC'][i+1:i+6].mean() * liq_vol_all[liq_data.TICKER == j]['VOL'][i+1:i+6].mean())/

  1. When calculating the turnover ratio if the [liq_data.TICKER == j] modification suggested above is included the covariance calculation fails on row 233 as there are insufficient rows left for i, i+6 to be compared against i,i+5 as there are only 238 INTC rows in total.

The only fix i can suggest is to skip calculating the roll value for last 5 rows for each TICKER but this doesn't seem very acceptable as it leaves missing values in the dataset:

for j in liq_vol_all.TICKER.unique(): for i in range(len(liq_vol_all[liq_vol_all.TICKER == j])-5):

  1. On Lhh calculation we have:

(liq_vol_all[liq_data.TICKER == j]['VOL'][i:i+5].sum() /

As a suggestion to ensure the last few rows are of the same magnitude as the previous values as fewer rows are available as we come to the end of the dataset an option might be:

((liq_vol_all[liq_data.TICKER == j]['VOL'][i:i+5].mean()*5) /

I stopped here with this chapter based on my comments above