dwavesystems / dwave-optimization

Enables the formulation of nonlinear models for industrial optimization problems.
https://docs.ocean.dwavesys.com/en/stable/docs_optimization/index.html#index-optimization
Apache License 2.0
5 stars 15 forks source link

Add symbol labels #17

Open arcondello opened 3 months ago

arcondello commented 3 months ago

We need the ability to label symbols and then retrieve them from their labels. Unfortunately for after launch though.

arcondello commented 2 months ago

Some potential approaches

1) Symbols "know" their labels

In this approach each symbol has a label associated with it, and has access to that label. So the API could be something like

model = Model()
x = model.binary()
x.label = "x1"

for symbol in model.iter_symbols():
    print(symbol.label)

which would print x1.

From an implementation perspective, there are two ways to do this

1a) C++ nodes add generic storage

Something like

struct NodeStorage {
    virtual ~NodeStorage() = default;
};

struct Node {
    // Hold an (uninitialized) storage pointer that can be initialized by Cython
    std::unique_ptr<NodeStorage> storage_ptr;

    // existing Node implementation here
};

we could then subclass NodeStorage from Cython, and use that space to put Python objects like labels. There are definitely some issues here with ref-counting, but nothing that can't be done carefully.

The cost of this approach is one pointer per node, so potentially ~16 MB in the worst case, assuming our current 2,000,000 node number. Not counting whatever is actually stored in it.

This also has the disadvantage of not (easily) allowing retrieval by label. Because you'd need to traverse all of the nodes looking for a specific (not necessarily unique) label.

1b) Cython node storage

In this approach, we let Cython store the node data. So something like


cdef class Model:
    cdef object node_ptr_to_label = dict()
    cdef object label_to_node_ptr = dict()

where we store a mapping from labels to nodes and vice versa. There is definitely a source-of-truth issue here when/if we add node removal (https://github.com/dwavesystems/dwave-optimization/issues/41).

In the example above, I've also used ptr under the assumption that we don't want to keep the Symbols themselves to save on memory. But of course that would be simpler. We could also do unordered_map[PyObject*, Node*] and vice versa rather than dict. But that has its own issues with ref-counting.

2) Pyomo-style labels

Pyomo, a popular package for mathematical optimization, encourages users to simply save their labels on the models as attributes. See for example their working with pyomo docs.

In that case, we would simply encourage users to do

model = Model()
model.x1 = model.binary()

This already works out-of-the box, so the change would be to encourage this behavior. It's definitely very simple and supports arbitrary storage structured (e.g. model.x = [model.binary(), model.integer()] works fine). The cost is that symbols don't "know" their labels, so we couldn't, for instance, use the labels when printing. But I am not sure how much of a cost that is.

We could also potentially add some sort of traversal mechanism by traversing the model __dict__.

>>> model = Model()
>>> model.x = model.binary()
>>> model.__dict__
{'x': <dwave.optimization.symbols.BinaryVariable at 0x7f27ea58c6a0>}

This also keeps the symbols as Python objects, which gets expensive.