Closed seanlaw closed 9 months ago
Hi Sean! Thanks for your interest in our implementation of the SB algorithm.
This package actually already handles the Markowitz portfolio integer optimization problem thanks to the Markowitz
API in the simulated_bifurcation.models
module. However, the current implementation only allows optimization on a single timestamp instead of several consecutive ones, i.e. solving:
$$\hbox{argmax } w^T \mu - \frac{\gamma}{2} w^T \Sigma w$$
for a given risk coefficient $\gamma$, expected return vector $\mu$, and covariance matrix $\Sigma$.
Adding a sequential version of this problem with multiple coupled timestamps and a global optimization as a sum (similar to the paper you mentioned) is definitely something we will implement for the next release of the package. The goal will be to extend and generalize the API so that the problem becomes:
$$\hbox{argmax } \sum{t = 1}^{T} w{t}^{T} \mu{t} - \frac{\gamma}{2} w{t}^{T} \Sigma{t} w{t} - \Delta w_t^T \Lambda_t \Delta w_t$$
where $\Lambda_t$ accounts for the rebalancing cost resulting from the stock difference $\Delta w_t = wt - w{t-1}$ between two consecutive timestamps.
We will let you know when this feature is added to the package. In the meantime, you can try the Markowitz API included in the last release of the package.
Edit: after further checks, it appears the
Markowitz
API may not work as expected since the source code was not updated with the new Polynomial API introduced by version1.2.0
. Please find below a way to implement a Markowitz model using thesimulated_bifurcation
package (the current code will be fixed at the same time as the development of the new feature).
Given a covariance
matrix and an expected_return
vector (as numpy arrays or torch tensors), you can set the risk coefficient of the Markowitz model to match your risk aversion. You can also choose the number of bits to encode the stocks. By default, this value is 1, which means that the decision is binary (either hold an asset, or sell it). Increasing the number of bits used to encode the weights can refine this selection by weighting the set of selected stocks (but unfortunately this can reduce the precision by introducing new spins in the SB optimization).
from typing import Optional, Union
import numpy as np
import torch
from simulated_bifurcation.polynomial import IntegerPolynomial
class Markowitz(IntegerPolynomial):
"""
A representation of the Markowitz model for portfolio optimization.
Portfolio only takes integer stocks.
"""
def __init__(
self,
covariance: Union[torch.Tensor, np.ndarray],
expected_return: Union[torch.Tensor, np.ndarray],
risk_coefficient: float = 1,
number_of_bits: int = 1,
dtype: torch.dtype = torch.float32,
device: str = "cpu",
) -> None:
# Data
super().__init__(
-risk_coefficient / 2 * covariance,
expected_return,
None,
number_of_bits,
dtype,
device,
)
self.risk_coefficient = risk_coefficient
@property
def covariance(self) -> torch.Tensor:
return -(2 / self.risk_coefficient) * self.matrix
@property
def expected_return(self) -> torch.Tensor:
return self.vector
@property
def portfolio(self) -> Optional[torch.Tensor]:
return (
self.sb_result[:, torch.argmax(self(self.sb_result.t())).item()]
if self.sb_result is not None
else None
)
@property
def gains(self) -> float:
return self(self.portfolio) if self.portfolio is not None else 0.0
# Create the Markowitz model
markowitz = Markowitz(
covariance,
expected_return,
risk_coefficient = 1,
number_of_bits = 1,
)
# Optimize with SB
markowitz.maximize()
# Get the stocks for each asset as a tensor
print(markowitz.portfolio)
# Get the expected gains from this portfolio (objective function)
print(markowitz.gains)
If you have any other ideas, suggestions, or requests for models to add to the package to expand the pool of problems that can be solved with the SB API, please feel free to open new issues like this one and we will be happy to have a look at it.
@bqth29 Thank you for your thorough explanation. I will give it a try once I better understand the necessary inputs and I'm certain that I'll have more questions. SB is pretty awesome!
If you have any other ideas, suggestions, or requests for models to add to the package to expand the pool of problems that can be solved with the SB API, please feel free to open new issues like this one and we will be happy to have a look at it.
Sounds good. I'm completely new to SB (and optimization) and, for me, the hardest part is being able to map the optimization problem into an Ising one and my suspicion is that others may have a similar issue. And so I think being able to see a multiple different end-to-end examples along with the required raw inputs (and their transformation into SB required inputs) would really help bridge the (understanding) gap. Your team's explanations have been patient and thorough so I have little doubt that you will be able to build a strong community! Keep up the great work
@bqth29 On a somewhat related note, would you also happen to have a Jupyter notebook that uses this package and reproduces all of the examples in your work on "Approximating Optimal Asset Allocations using Simulated Bifurcation"? Are the various datasets that you used available as well?
@bqth29 On a somewhat related note, would you also happen to have a Jupyter notebook that uses this package and reproduces all of the examples in your work on "Approximating Optimal Asset Allocations using Simulated Bifurcation"? Are the various datasets that you used available as well?
Dear Sean,
Unfortunately we carried out this work more than two years ago and we lost the source code we had used back then. However, we still are in our possession of the data (cov.csv
and mu.csv
files - see the appendix at the end of this message) because we pushed them to this repository in an earlier version of this code.
Yet, the implementation of the sequential Markowitz model we discussed above is almost ready and a pull request (#19) has been opened. We plan to write a Jupyter Notebook or a Google Colab Notebook when the PR is merged with new data we gathered to take in account the rebalancing costs. We can also add a small example relying on the old S&P500 dataset we used for our previous work. Be sure that we will inform you as soon as it is ready.
In the meantime, you can try running the following code snippets to use the Markowitz SB API on a real dataset and mimic the experiments we pursued in the work you mentioned.
You shall first checkout on the
codecov-models
branch first until the PR #19 is merged so you use the fixed version of theMarkowitz
model
Create the covariance matrix and expected returns vector from the data
The time data must be pre-processed in the first place for a given date (namely March 1st 2021) in order to create a proper covariance matrix and an expected returns vector.
import numpy as np
import pandas as pd
sigma_path = "path/to/covariance.csv"
mu_path = "path/to/mu.csv"
complete_monthly_returns = pd.read_csv(mu_path)
complete_monthly_returns.set_index('Date', inplace = True)
sigma = pd.read_csv(sigma_path)
sigma.set_index('Unnamed: 0', inplace = True)
assets = sigma.index.to_list()
expected_returns = np.expand_dims(complete_monthly_returns.loc["2021-03-01"].to_numpy(), 1)
covariance = sigma.to_numpy()
Instantiate the Markowitz model from the data and optimize it with SB
Below you can find how to compute the one-bit weights case we addressed in our work by only allowing binary weights for the stocks (either an asset is bought either it is discarded).
You can also try with a higher number of bits but the accuracy may be less satisfying.
import simulated_bifurcation as sb
markowitz = sb.models.Markowitz(covariance, expected_returns, number_of_bits=1)
markowitz.maximize()
Compute the gains from the optmized model
print(f"Total gains: {markowitz.gains}")
To give an order of magnitude, the best gains value we found is 2.843490
Create a dataframe portfolio and visualize the stocks on a pie chart
portfolio = pd.DataFrame(
{
"assets": assets,
"stocks": [int(x) for x in markowitz.portfolio.tolist()]
}
)
# Only keep the non-null stocks
portfolio = portfolio[portfolio.stocks > 0]
A nice way to dynamically visualize the portfolio is to use the plotly
package. For the one-bit weights case, the render may be hard to read. Try increasing the number of bits for a better visualization.
import plotly.express as px
px.pie(names=portfolio.assets, values=portfolio.stocks)
Appendix: links to the data
The data is too big to be attached to this message. Instead, you can find below the permalinks to the files as we used them on this repository several months ago:
Thank you for your prompt response @bqth29! I will take a look
@bqth29 Are you able to help me understand how to reproduce this portfolio optimization example using SB?
I've managed to retrieve the data with:
import requests
import torch
url = "http://people.brunel.ac.uk/~mastjjb/jeb/orlib/files/port5.txt"
response = requests.get(url)
lines = (l.strip() for l in response.text.split('\n'))
n_assets = int(next(lines))
mean_return = torch.empty(n_assets, dtype=torch.float64)
stddev_return = torch.empty(n_assets, dtype=torch.float64)
for i in range(n_assets):
mean_return[i], stddev_return[i] = (float(l.strip()) for l in next(lines).split())
corr = torch.full((n_assets, n_assets), torch.nan, dtype=torch.float64)
for l in lines:
if l != "":
i, j, value = l.split()
i, j, value = int(i)-1, int(j)-1, float(value)
corr[i, j], corr[j, i] = value, value
covar = corr * (stddev_return * stddev_return)
r = 0.002 # 0.2 percent expected return
However, it isn't clear how to appropriately:
r
Any suggestions/guidance would be greatly appreciated!
Hello Sean,
Here is a short answer. I do not have the time to write a detailed response right now, but if you need more details, do not hesitate to ask.
If you want to add a constraint of the type $\displaystyle \sum_{i=1}^n \alpha_i x_i = \beta$, simply add the following cost function to your existing one:
H_{eq} = \lambda \left( \displaystyle \sum_{i=1}^n \alpha_i x_i - \beta \right)^2
You need to choose $\lambda > 0$ such that violating a constraint will always be more costly than what it can bring in your original problem.
Let's assume I have $n$ binary variables, and I want to have between $a$ and $b$ binary variables equal to 1. In that case, introduce $b - a$ auxiliary variables $y_i$. Add this cost function to your existing one:
H_{eq} = \lambda \left(a + \displaystyle \sum_{i=1}^{b - a} y_i - \displaystyle \sum_{i=1}^n x_i \right)^2
Again, choose $\lambda > 0$ such that violating a constraint will always be more costly than what it can bring in your original problem. Also, as I write this, I realize all the $y_i$ play a symmetrical role, leading to a degenerate ground state. It might be better to use a slightly different embedding.
Note: A similar reasoning works for constraining a subset of the variables.
Note 2: There is also something called the log trick to use $\log(b - a)$ auxiliary variables instead of $b - a$. However, it seems (take it with a pinch of salt) that this does not work very well with SB. This is something we would like to investigate further at some point.
As of now, I don't see an immediate solution (at least not without introducing a significant number of auxiliary variables). I'll need to think more about it to come up with a viable approach.
I hope this helps. Let me know if you have any more questions!
In this paper, they demonstrate the use of SB for solving the portfolio integer optimization problem. I'm sure you are super busy but it would be useful to reproduce this work on different number of assets in order to demonstrate both the quality as well as scalability of this package.