enricoschumann / NMOF

Functions, examples and data from the first and the second edition of "Numerical Methods and Optimization in Finance" by M. Gilli, D. Maringer and E. Schumann (2019, ISBN:978-0128150658). This repository mirrors https://gitlab.com/NMOF/NMOF .
http://enricoschumann.net/NMOF
Other
34 stars 6 forks source link

Constrain number of assets in trackingPortfolio #4

Closed Thie1e closed 1 year ago

Thie1e commented 1 year ago

Hi Enrico,

I took some time to go through the help pages as well as the vignettes for NMOF, but I am still not sure if my use case is supported or not.

I am trying to understand how to build a portfolio out of stocks that tracks an index / a benchmark while holding only a certain number of stocks. For example: "Track the S&P 500 as closely as possible using 50 stocks at any time."

With trackingPortfolio I can get the weights, but only after selecting the stocks. Also, some weights will be zero, so that I don't have control over the number of assets in my portfolio. NMOF has a vignette for asset selection, but it assumes equal weighting.

Is there a way to define the number of assets for index tracking? Maybe you could guide me in the right direction.

Thank you very much in advance.

Best, Christian

enricoschumann commented 1 year ago

Hi Christian,

I am afraid the constraints you describe are not supported in trackingPortfolio.

But a similar example is described in the NMOF book, in the chapter on portfolio optimization: minimize variance of a portfolio with a cardinality constraint. (If you can access sciencedirect, you can get the chapter from there. The relevant section is called A simple hybrid: Local Search and QP). The code is in the GitLab repository, starting here: https://gitlab.com/NMOF/NMOF2-Code/-/blob/master/14_Portfolio_optimization/R/Portfolio_Optimization.R#L912

In fact, the code in the vignette should get you started, too: as you describe it, your problem becomes straightforward once the assets are selected. So I'd run a local-search algorithm [TAopt would be my preferred choice] that selects assets, and then use the selected assets as inputs for trackingPortfolio. Essentially, it means to call trackingPortfolio in the objective function. If you don't want zero-weights, you can enforce a reasonable minimum weight.

Hope that helps Enrico

Thie1e commented 1 year ago

Hi Enrico,

thank you for your quick and helpful answer. I will go through the chapter you mentioned. The example application there (the hybrid) looks indeed quite similar to mine.

The second approach (TAopt and trackingPortfolio with non-zero weights) should work, too. Let me play around with these approaches and I will then get back to you here.

Best, Christian

Thie1e commented 1 year ago

Hi Enrico,

here is my solution using TAopt and trackingPortfolio in the objective function. Does this look OK? I could share some of the raw data privately but can't post it here.

# Objective function:
# Tracking error after calculating weights for tracking a benchmark
OF_track <- function(x, Data) {
  returns <- cbind(Data$y, Data$X[, x])
  cov_ret <- cov(returns)
  sol.ls <- trackingPortfolio(var = cov_ret, R = returns, wmax = wmax, wmin = wmin, method = "ls")
  port_ret <- Data$X[, x] %*% sol.ls
  return(sd(Data$y - port_ret)) # Tracking Error
}

# keep cardinality (number of stocks) constant
neighbour_cardi <- function(x, Data) {
  Ts <- which(x)
  Fs <- which(!x)
  lenTs <- length(Ts)
  O <- sample.int(lenTs, 1L)
  I <- sample.int(Data$p - lenTs, 1L)
  x[c(Fs[I], Ts[O])] <- c(TRUE, FALSE)
  x
}

# Generate a random solution for fixed number of stocks and then run TAopt
x0 <- c(rep(F, Data$p - n_stocks), rep(T, n_stocks))
x0 <- sample(x0, replace = F)
algo <- list(nT = 7L, ## number of thresholds
             nS = 30L, ## number of steps per threshold
             nD = 200L, ## number of random steps to compute thresholds
             neighbour = neighbour_cardi,
             x0 = x0,
             printBar = T)
message("Starting TAopt...")
sol1 <- TAopt(OF_track, algo = algo, Data = Data)

# Solution:
print(paste("Best solution from TAopt:", round(sol1$OFvalue, 4)))

which(sol1$xbest) ## the selected regressors

# Calculate weights using trackingPortfolio for best solution
returns <- cbind(Data$y, Data$X[, sol1$xbest])
cov_ret <- cov(returns)
sol.ls <- trackingPortfolio(var = cov_ret, R = returns, wmax = wmax, wmin = wmin, method = "ls")

To test the tracking with that solution, I ran a time series cross validation from 2001 - 2021 for the Russell 1000. The index value should be tracked by a basket of 100 stocks. The stock universe consisted of the stocks that were part of the Russell 1000 at the respective times (no survivorship bias, hopefully).

The training set was a moving window of the last 500 trading days. Testing on the subsequent 40 trading days, then moving by 40 days, with a training period from roughly 2001 to 2021 resulted in 112 training 'slices'.

Using trackingPortfolio in the objective function is of course relatively slow. With the lowered numbers in algo the cross validation still took around 6 hours.

This is what the result looks like:

grafik

The backtest is excluding transaction costs.

What about unadjusted prices - I am not sure if the Russell 1000 includes dividends or not. I assume it does not, so I used unadjusted prices. The tracking seems to be better when using unadjusted prices. But in reality, the investor would of course receive those dividends, so the backtest with adjusted prices should be the more 'realistic' one, right? Of course after also considering costs.

Before I discovered NMOF or TAopt, I used a glmnet to regress the returns of the single stocks onto the index returns. With alpha = 1 (lasso regression) I can then pick the penalty parameter such that I get the desired number of stocks in the stock portfolio.

The tracking with this method is comparable (?) to the tracking by TAopt and it runs much faster, of course. The above simulation with TAopt took about 6 hours (single threaded). With glmnet it finishes in about 15 minutes. Result using glmnet:

grafik

Regarding the tracking error (TE): If I calculate the tracking error as the standard deviation of index returns minus portfolio returns based on the end-of-year values in my simulation, I get

TE with TAopt (same simulation as the chart above): 6.2% Ann. return with TAopt: 10.6% (benchmark 10.3%) TE with glmnet: 3.57% Ann. return with glmnet: 9.5% (benchmark 10.3%)

Another observation I have made based on my backtests is that the portfolios selected by TAopt seem to be more instable than the ones selected by glmnet: Again, the portfolios consisted of 100 stocks that should track the Russell 1000, and with TAopt I get a median of 85 differing tickers between time slices in the simulation (so nearly the whole portfolio gets exchanged every 40 days). With glmnet I get a median of 20 differing tickers. This more stable result from glmnet would be of course important in practice because of transaction costs.

Another difference between the models is the distribution of portfolio weights. I put trackingPortfolio into the objective function with a minimum weight of 0.001. This results in the following distribution of portfolio weights (all CV slices combined):

grafik

So it seems to me that TAopt sets a lot of weights to the minimum to filter out these stocks as far as possible.

The distribution of weights from glmnet is much smoother:

grafik

Sorry that this response has become so long and thank you very much, again, for your answers. So to sum up, my questions are:

  1. Is the implementation of TAopt with trackingPortfolio correct?
  2. Adjusted vs. unadjusted prices
  3. Tracking error: What would be an acceptable tracking error, given my simulation above? Should I check the results for TAopt, since the tracking error is higher than with glmnet?
  4. Any comments you might have on the stability of selected stocks and the distribution of portfolio weights.

Several years ago by now, I also studied econometrics, but I was never concerned with these types of portfolio optimizations. I simply have the feeling that there must be a better way than my glmnet method that I have just 'made up' (although there are some papers using glmnet for portfolio optimization, I think).

Best, Christian

Edit: I have updated the figure and numbers for TAopt. I think I reported the results with adjusted instead of unadjusted prices before. The results are from a backtest with window of 150 days, let me rerun the test with a window of 40 days over night...

Thie1e commented 1 year ago

Here's the backtest based on TAopt and a CV-window of 40 days (trainset still a moving window of the last 500 days):

grafik

This time I get a tracking error of 4.7% and a slightly lower ann. return. It seems to me that these differences are somewhat random, caused by the different CV-window lengths.

enricoschumann commented 1 year ago

1) I had a quick look, and it seems okay. (But to be sure, I'd need a code example I can really run.) Is there any particular reason why you used method "ls" in trackingPortfolio? This will be much slower than "qp".

2) Unadjusted prices are what you trade. So it is fine to use them.

3) There is no single answer to that question; but a difference in tracking errors of three percentage points (as between TAopt and glmnet) is huge (provided that the glmnet answers do not violate any constraints).
So my guess is that TAopt would need more time to find a good solution. But it's hard to say without a reproducible example.

4) TAopt is a stochastic method, and so repeated runs might have different results. However, this randomness can be made very small, by allowing more iterations. This is discussed a lot in the NMOF book, and also in https://link.springer.com/article/10.1007/s10732-010-9138-y or https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1140655 .

Thie1e commented 1 year ago

Hi Enrico,

thank you for answering again.

  1. I used "lm" because "qp" was giving errors, but it does not do that anymore. So I switched to "qp".
  2. I was unsure what the standard method for reporting the tracking error is, but without annualization it seems to be simply sd(index_returns - portfolio_returns). Then with daily rebalancing over a test period of 84 days NMOF gives a TE = 0.0013 and glmnet has a TE = 0.0014 (but with lower turnover). So very close.
  3. Yes, I have increased the parameters of TAopt a lot (close to the ones used in the manual) and the results are more stable now.

I think I have a better understanding of methods for index tracking now and since I got trackingPortfolio to work in TAopt I am going to close this issue now. Thanks again for your help.