Closed janw20 closed 7 months ago
The idea is to have a similar interface as the array class from
ndarray
, but not store zeros to keep down the file size, soPackedArray<T, D>
will be the successor ofSparseArray3<T>
. The new type will compress a bit better and more importantly it supports arbitrary dimensions.
Yes, I just read the first part, and it was not obvious that it was the sparse array, sorry.
Skimming through it, there were no docs, so I made the wrong inference.
I deleted the comment right after, as soon as I've seen the Index
and IndexMut
implementation.
No worries! We're happy you're interested and that you're looking at the code.
Btw: what about just using SparseArray
? (w/o the 3)
The adjective 'sparse' never made sense, because when you'd have to decide whether the arrays were sparse or dense, they're clearly dense.
Another way of explaining it is that the data structures for representing sparse matrices would clearly be inefficient for our use cases and in fact need much more memory. There's usually a connected subset of the array that is non-zero, and when you integrate long enough the this subset should even be simply connected.
From your comment, I'd say that it is an optimized strategy for the sparse matrix representation (as there is not a single best strategy). I.e., your matrix has a relevant amount of default elements, and you could compress your representation by avoiding writing down all elements.
However, names are relevant but not crucial. So, if packed sounds more appropriate, just keep that :)
From your comment, I'd say that it is an optimized strategy for the sparse matrix representation (as there is not a single best strategy). I.e., your matrix has a relevant amount of default elements, and you could compress your representation by avoiding writing down all elements.
That's basically it, we first map the D
-dimensional indices to one-dime sional and then we store the non-zero (non-defaulted) elements a linearized vector. For zeros we just save where they are and how many of them are consecutive. Finally there's an exception where it makes sense to explicitly store zeros, however in practice it shouldn't make a big difference.
After looking into the sparse matrix Wikipedia article, I think we're doing a kind of band matrix (at least for D=2
), which (according to Wikipedia) is a sparse matrix. So I think I would be fine with calling it SparseArray
, also because that is more understandable and consistent with SparseArray3
I wrote 4 benchmarks with the criterion
crate where 500 random elements are inserted into a SparseArray3
or a PackedArray
of shapes [40, 50, 30]
or [10, 5, 7]
, and it seems like the performances of both are very similar, with PackedArray
being about ~5-10% faster, depending on the RNG seed
just for the sake of knowledge transfer: I think I understand the general idea (you're saving coordinates), but index_mut
could only profit from some more comments :see_no_evil: (as this is clearly a crucial part)
Hi Felix, do you mean ///
documentation for the index_mut
function? Or just in-line comments? Also, how is this normally done, should I commit to this pull request again, and afterward it will be again rebased? Or should I open a new branch on the main repo?
@janw20: I believe @felixhekhorn wants a better description of what's going inside index_mut
. In that case document it with //
inside the method, because the publicly available documentation (the one marked with ///
) should only explain what the method does, not how. You can commit straight to master
, let me know if you have permission problems.
I improved the documentation and committed it in https://github.com/NNPDF/pineappl/commit/6a711bbd347a0f1d090ddac043ce0358cc5fa45d to master
.
PackedArray
is possibly a bad name and that would be my fault. If anyone can think of a better name let us know!The idea is to have a similar interface as the array class from
ndarray
, but not store zeros to keep down the file size, soPackedArray<T, D>
will be the successor ofSparseArray3<T>
. The new type will compress a bit better and more importantly it supports arbitrary dimensions.