dwavesystems / dimod

A shared API for QUBO/Ising samplers.
https://docs.ocean.dwavesys.com/en/stable/docs_dimod/
Apache License 2.0
122 stars 81 forks source link

CQM unexpected behaviour, less-than-or-equal (<=) constraint is off-by-one #1313

Closed wilhelmagren closed 1 year ago

wilhelmagren commented 1 year ago

Description

TL;DR When sampling from a CQM with a linear constrain of binary variables, e.g., $x_0 + x_1 + x_2 <= 2$, where $x_i\in\{0, 1\}$, the upper bounds of the Sense is not respected. The sampler produces solutions where the bounds is off-by-one, meaning, given a bias $B$, I get solutions for $B+1$.

In detail:

I have defined a portfolio optimization problem using qiskit_finance of $N$ asset and set the budget constrain as $B = N / 2$. When solving with the qiskit NumPyMinimumEigensolver for the ground-truth optimal portfolio I get a solution with $B$ amount of assets chosen.

I then convert the qiskit quadratic_program into a BinaryQuadraticModel in the following way:

BQM = BinaryQuadraticModel(
    quadratic_program.objective.linear.to_dict(),
    quadratic_program.objective.quadratic.to_dict(),
    quadratic_program.objective.constant,
    Vartype.BINARY,
)

and then create a ConstrainedQuadraticModel from the BQM in the following way, and add the constraint that the sum of all binary variables should at most be equal to our budget $B$:

CQM = ConstrainedQuadraticModel.from_bqm(BQM)

constraint_label = CQM.add_constraint_from_iterable(
    [(variable, 1) for variable in CQM.variables],
    '<=',
    rhs=B,
)

When printing the CQM both the objective and constraints I can verify that the budget constraint is set correctly. But when running and using the LeapHybridCQMSampler, getting the optimal sample, I for some problem instances get a solution with $B + 1$ variables marked.

This problem is not always present, because sometimes the optimal solution does not require $B$ amount of assets to be chosen. But the bug happens fairly often when randomizing the input data, leading me to think that there might be a simple off-by-one or rounding bug somewhere in the source code.

Steps To Reproduce

I will give a specific instance of the problem I am solving, which always leads to the unexpected behaviour. It requires you to have ready two API tokens, one from Nasdaq data link to get the stock data (its a free signup and usage), and your D-Wave Leap API token.

from qiskit_finance import QiskitFinanceError
from qiskit_finance.data_providers import *
from qiskit_finance.applications.optimization import PortfolioOptimization
from qiskit_optimization.problems import QuadraticProgram
from dimod import BinaryQuadraticModel
from dimod import ConstrainedQuadraticModel
from dimod import Vartype
from dimod import Binary
from dwave.system import LeapHybridCQMSampler
from datetime import datetime

NASDAQ_TOKEN = ''

n_qubits = 10
stocks = [
    'AXP', 'BA', 'CPB', 'CAT', 'CVX', 'CSCO', 'KO', 'CL', 'DIS', 'FDX',
]
n_assets = len(stocks)

assert n_assets == n_qubits

try:
    dataset = WikipediaDataProvider(
        token=NASDAQ_TOKEN,
        tickers=stocks,
        start=datetime(2012, 1, 1),
        end=datetime(2016, 12, 31),
    )
    dataset.run()
except QiskitFinanceError as error:
    print(
        'Error retrieving the data, ', error,
    )

q = 0.5
B = n_assets // 2

print(
    f'Risk factor {q=}\tBudget {B=}'
)

mu = dataset.get_period_return_mean_vector()
sigma = dataset.get_period_return_covariance_matrix()

problem = PortfolioOptimization(
    expected_returns=mu,
    covariances=sigma,
    risk_factor=q,
    budget=B,
)

quadratic_program = problem.to_quadratic_program()
print(quadratic_program)

BQM = BinaryQuadraticModel(
    quadratic_program.objective.linear.to_dict(),
    quadratic_program.objective.quadratic.to_dict(),
    quadratic_program.objective.constant,
    Vartype.BINARY,
)

CQM = ConstrainedQuadraticModel.from_bqm(BQM)

constraint_label = CQM.add_constraint_from_iterable(
    [(variable, 1) for variable in CQM.variables],
    '<=',
    rhs=B,
)

print(CQM)

DWAVE_TOKEN = ''
sampler = LeapHybridCQMSampler(token=DWAVE_TOKEN)

label = f'CQM optimization {n_assets} assets'

sample_set = sampler.sample_cqm(CQM, label=label)

sample = sample_set.first.sample

solution = {
    s: sample[i] for i, s in enumerate(stocks)
}

for stock, amount in solution.items():
    print(stock, amount)

The last print yields the following output:

AXP 0.0
BA 1.0
CPB 1.0
CAT 0.0
CVX 0.0
CSCO 1.0
KO 0.0
CL 1.0
DIS 1.0
FDX 1.0

and the amount of 'picked' variables is $>B$.

Expected Behavior

I expect the sampler to only produce solutions that are feasible according to the defined constraint. Meaning, number of 'picked' variables $\leq B$.

Environment

Additional Context

The code is originally being run in a Jupyter Notebook.

arcondello commented 1 year ago

Hi @willeagren ,

Thanks for the detailed report! Would it be possible for us to also get the intermediate CQM? You can serialize it with

import zipfile
import shutil

import dimod

cqm = ...

with zipfile.ZipFile("example.zip", "w") as zf:
    with zf.open("example.cqm", "w") as f:
        shutil.copyfileobj(cqm.to_file(), f)
wilhelmagren commented 1 year ago

Hi @arcondello ,

I'm not quite sure I managed to serialize it properly. The intermediate "example.cqm" file looks quite strange. But here is the zip file: example.zip

Just making sure, there's no way to retrieve my D-Wave Leap API token from the serialized CQM right?

arcondello commented 1 year ago

Looks correct to me! I'll take a look and get back to you when I know more.

Just making sure, there's no way to retrieve my D-Wave Leap API token from the serialized CQM right?

That's correct, it only contains the CQM itself, i.e. the biases, variables, bounds, and labels.

arcondello commented 1 year ago

Ok, I see the issue. Sorry, I should have noticed it earlier. You just need to change the line where you get the lowest energy sample from

sample = sample_set.first.sample

to

sample = sample_set.filter(lambda d: d.is_feasible).first.sample

In the former you are selecting the sample with the lowest energy in terms of the objective. Since in this problem it is energetically advantageous to violate the constraint, you are selecting infeasible solutions. In the latter, you are first filtering for only feasible solutions.

We're considering making the filtering behaviour default in future versions of Ocean, to avoid this sort of confusion. But for now, filtering should resolve the issue.

wilhelmagren commented 1 year ago

Ah ok, thanks! I would've assumed that the sampling of the CQM would only consider states that are feasible according to the constraint(s). I think the default filtering behaviour you mention would be great, since that was kind of how we expected it to behave.

Otherwise, one could just work with the DQM and then filter manually after sampling.. perhaps that's how everything works under the hood(?)

Anyway, thanks for the help! I'll close the issue 😄