Open ev-br opened 8 years ago
Probably best to compare against pysparse.ll_mat
Also, memory usage would be important to understand and document
Very crude for now, just %timeit and %memit, using the example from pysparse docs: https://github.com/ev-br/sparr/blob/master/benchmarks.ipynb
Basically, this is now about 2x slower than ll_mat
for single-element __setitem__
+ python looping overhead. The memory consumption is not great, but not terrible either, at least on this problem size.
The memory usage may be a non-trivial disadvantage for std::map
It could be, yes. Like I was saying over at scipy issue, I wonder what's the deal with the Wikipedia link map: it's being invoked for both sides of the argument.
On a more serious note, there's no question that a sorted vector of sorted vectors wins if there's an idea of the number of rows/columns. Otherwise it's either reallocation or back to essentially the same memory as std::map
, no?
Vec of vecs has a per-row overhead, whereas std::map
has a per-entry overhead. This matters if you have a lot of entries.
But you have to reallocate vectors if there's an insertion into a row which was previously empty and there were non-empty rows from both sides of it?
In https://github.com/scipy/scipy/pull/6004 I pre-allocate all the row vectors, so I pay the per-vector overhead up-front.
Makes sense --- so you require an a priori estimate of the number of rows?
I just realised you've been working on this recently, apologies for stepping on your toes! I scanned scipy PRs so as to not duplicate work, but yours was separate :(
Yes. This can be relaxed with some work.
Re: toes: No problem whatsoever! Hope you don't feel I'm stepping on yours.
In your solution with preallocation: you can probably just have a second vector with row indices. The overhead is at most nrows
, but there's no reallocation at all.
Yep, or have a resize
call that allocates new vectors when necessary.
https://github.com/ev-br/sparr/pull/40#issuecomment-208940574 for some numbers on constructing a Possoin 2D matrix from pysparse
docs.
I've no idea why memory benchmarks report all zeros. From peakmem
, fast_lil_mat actually looks good :-). Timewise, ll_mat
wins hands down.
and actually compare to, say,
dok_matrix
.