Open arcondello opened 5 months ago
Some potential approaches
In this approach each symbol has a label associated with it, and has access to that label. So the API could be something like
model = Model()
x = model.binary()
x.label = "x1"
for symbol in model.iter_symbols():
print(symbol.label)
which would print x1
.
From an implementation perspective, there are two ways to do this
Something like
struct NodeStorage {
virtual ~NodeStorage() = default;
};
struct Node {
// Hold an (uninitialized) storage pointer that can be initialized by Cython
std::unique_ptr<NodeStorage> storage_ptr;
// existing Node implementation here
};
we could then subclass NodeStorage
from Cython, and use that space to put Python objects like labels. There are definitely some issues here with ref-counting, but nothing that can't be done carefully.
The cost of this approach is one pointer per node, so potentially ~16 MB in the worst case, assuming our current 2,000,000 node number. Not counting whatever is actually stored in it.
This also has the disadvantage of not (easily) allowing retrieval by label. Because you'd need to traverse all of the nodes looking for a specific (not necessarily unique) label.
In this approach, we let Cython store the node data. So something like
cdef class Model:
cdef object node_ptr_to_label = dict()
cdef object label_to_node_ptr = dict()
where we store a mapping from labels to nodes and vice versa. There is definitely a source-of-truth issue here when/if we add node removal (https://github.com/dwavesystems/dwave-optimization/issues/41).
In the example above, I've also used ptr
under the assumption that we don't want to keep the Symbol
s themselves to save on memory. But of course that would be simpler. We could also do unordered_map[PyObject*, Node*]
and vice versa rather than dict
. But that has its own issues with ref-counting.
Pyomo
, a popular package for mathematical optimization, encourages users to simply save their labels on the models as attributes. See for example their working with pyomo docs.
In that case, we would simply encourage users to do
model = Model()
model.x1 = model.binary()
This already works out-of-the box, so the change would be to encourage this behavior. It's definitely very simple and supports arbitrary storage structured (e.g. model.x = [model.binary(), model.integer()]
works fine). The cost is that symbols don't "know" their labels, so we couldn't, for instance, use the labels when printing. But I am not sure how much of a cost that is.
We could also potentially add some sort of traversal mechanism by traversing the model __dict__
.
>>> model = Model()
>>> model.x = model.binary()
>>> model.__dict__
{'x': <dwave.optimization.symbols.BinaryVariable at 0x7f27ea58c6a0>}
This also keeps the symbols as Python objects, which gets expensive.
We need the ability to label symbols and then retrieve them from their labels. Unfortunately for after launch though.