Closed irm-codebase closed 1 week ago
We've investigated sparse arrays a few times in the past. The problem is the data types that can be handled sparsely. If it can now handle pyomo/gurobi objects then that's great!
Can you elaborate on the incompatibility?
The linked thread has MESSAGEix and PyPSA modellers, so it might be possible through some method.
Here is another thread that goes in detail for xarray
: https://github.com/pydata/xarray/issues/1375
I did some tests on this locally and it doesn't work out-of-the-box. Operations between sparse arrays don't work when the data types are objects, e.g.:
import calliope
import xarray
m = calliope.examples.urban_scale()
m.build(force=True)
# you can't transform an existing array into sparse, so this is a quick hack.
foo = xr.DataArray.from_series(m.backend.variables.flow_cap.to_series(), sparse=True)
bar = xr.DataArray.from_series(m.backend.parameters.flow_cap_max.to_series(), sparse=True)
foo * bar
[Out] TypeError: Implicit conversion of Pyomo numeric value (parameters[flow_cap_max][12]*variables[flow_cap][0]) to float is disabled.
This error is often the result of using Pyomo components as arguments to
one of the Python built-in math module functions when defining
expressions. Avoid this error by using Pyomo-provided math functions or
explicitly resolving the numeric value using the Pyomo value() function.
I've also tried setting the fill value of foo
and bar
to non-NaN vals and using np.multiply(foo, bar, dtype=np.object_)
. These all end up creating dtype issues in the scipy.sparse
library, e.g.:
File ~/miniforge3/envs/calliope/lib/python3.12/site-packages/sparse/_umath.py:542, in _Elemwise._get_fill_value(self)
540 # Store dtype separately if needed.
541 if self.dtype is not None:
--> 542 fill_value = fill_value.astype(self.dtype)
544 self.fill_value = fill_value
545 self.dtype = self.fill_value.dtype
AttributeError: 'float' object has no attribute 'astype'
File ~/miniforge3/envs/calliope/lib/python3.12/site-packages/sparse/_umath.py:524, in _Elemwise._get_fill_value(self)
521 fill_value_array = self.func(*np.broadcast_arrays(*zero_args), **self.kwargs)
523 try:
--> 524 fill_value = fill_value_array[(0,) * fill_value_array.ndim]
525 except IndexError:
526 zero_args = tuple(
527 arg.fill_value if isinstance(arg, COO) else _zero_of_dtype(arg.dtype) for arg in self.args
528 )
AttributeError: 'float' object has no attribute 'ndim'
File ~/miniforge3/envs/calliope/lib/python3.12/site-packages/sparse/_umath.py:545, in _Elemwise._get_fill_value(self)
542 fill_value = fill_value.astype(self.dtype)
544 self.fill_value = fill_value
--> 545 self.dtype = self.fill_value.dtype
AttributeError: 'float' object has no attribute 'dtype'
It works fine for operations on purely numeric data, just not when working with object arrays.
@brynpickering after messing around with the backend, I see what you mean... this one will be tough.
The big issue here is that np.nan
consumes the same amount of memory as a float
(64 bits), and we currently have nan
for stuff like carriers (i.e., adding a new carrier will add nans for all cases with that dim
). So, adding a single heat
technology to an electricity only model may bloat some parts by twice the amount of memory.
Pyomo does support sparsity natively... but I do not yet know the backend well enough to know how/what needs to change.
Algorithmically, a solution should be possible if we have full determinism when flattening matrixes. i.e., we always know the following:
dims
combinations.dims
will be added in relation to each other.dim
will be added. Not sure if we currently have this, but it should allow us to lazily fill in sparse vectors and then drop them to the backend. At least in theory...
We're already effectively using dense arrays as far as Pyomo is concerned. It's the application of operations across N dimensions (incl. broadcasting capabilities) that we benefit from by also representing those objects in NaN-filled arrays. We came from this full determinism in v0.6 to what we have now because it is very messy to ensure this in a generalised way, especially when math components don't share the exact same dimensions.
Yep, I understand that the issue is not in the backend. The increase in memory will only happen on our side... As far as a generalized algorithm... would something like this work? I've used something like this for similar problems.
import itertools
dims = ["techs", "nodes", "steps", "carriers", "foo", "bar", "perrito"]
all_unique_combinations_sorted = set()
for i in range(len(dims)):
for group in itertools.combinations(dims, i+1):
all_unique_combinations_sorted.add(".".join(sorted(group)).lower())
all_unique_combinations_sorted.add(["GLOBALS"]) # no idea if this is even needed
This contains all possible combinations of our dimensions, always in order (127 + GLOBALS
in this example).
Create a dict with them, and store a sparse xarray
in it. Add flattened vector data to it.
Lookups are easy: just sort and join the dimensions you want. Similarly, the large number of keys does not matter too much. Sparse data has little memory impact by design, and you can easily erase empty combinations if you wish.
Would this work, or am I saying something silly?
I'm closing this as not an issue that we plan to address, given how small our datasets are (even with all their NaNs) compared to peak memory consumption in the optimisation step (see this comment).
@brynpickering I'm ok in closing this.
Some closing remarks: my worry actually relates to the size of the optimisation problem. If we have a NaN parameter in the input data, there are two things that could happen:
However, I should've done more checks on this, so I will only re-open this if I identify it as a problem that reaches the actual optimisation.
What can be improved?
Calliope should be more memory efficient! <- Finally this one is applicable
While checking how to better support sparsity in our ecosystem, I found about
sparse
. It is quite literally focused on the common use-case of super sparse data in ESOMs.After some investigation, it seems like
xarray
either has, or is planning to, roll out support for this library: https://github.com/pydata/xarray/issues/3213.pandas
seems to have also rolled out support.I propose to evaluate how to integrate this into
calliope
, with two key design goals in mind:xarray
setups.dim
itself in our constraints for cases were they are ordered. Two use cases (for pathways):Version
v0.7