add benchmarks - Githubissues

ev-br commented 8 years ago

and actually compare to, say, dok_matrix.

pv commented 8 years ago

Probably best to compare against pysparse.ll_mat

pv commented 8 years ago

Also, memory usage would be important to understand and document

ev-br commented 8 years ago

Very crude for now, just %timeit and %memit, using the example from pysparse docs: https://github.com/ev-br/sparr/blob/master/benchmarks.ipynb

Basically, this is now about 2x slower than ll_mat for single-element __setitem__ + python looping overhead. The memory consumption is not great, but not terrible either, at least on this problem size.

pv commented 8 years ago

Also see https://github.com/scipy/scipy/pull/6004

pv commented 8 years ago

The memory usage may be a non-trivial disadvantage for std::map

ev-br commented 8 years ago

It could be, yes. Like I was saying over at scipy issue, I wonder what's the deal with the Wikipedia link map: it's being invoked for both sides of the argument.

ev-br commented 8 years ago

On a more serious note, there's no question that a sorted vector of sorted vectors wins if there's an idea of the number of rows/columns. Otherwise it's either reallocation or back to essentially the same memory as std::map, no?

maciejkula commented 8 years ago

Vec of vecs has a per-row overhead, whereas std::map has a per-entry overhead. This matters if you have a lot of entries.

ev-br commented 8 years ago

But you have to reallocate vectors if there's an insertion into a row which was previously empty and there were non-empty rows from both sides of it?

maciejkula commented 8 years ago

In https://github.com/scipy/scipy/pull/6004 I pre-allocate all the row vectors, so I pay the per-vector overhead up-front.

ev-br commented 8 years ago

Makes sense --- so you require an a priori estimate of the number of rows?

maciejkula commented 8 years ago

I just realised you've been working on this recently, apologies for stepping on your toes! I scanned scipy PRs so as to not duplicate work, but yours was separate :(

maciejkula commented 8 years ago

Yes. This can be relaxed with some work.

ev-br commented 8 years ago

Re: toes: No problem whatsoever! Hope you don't feel I'm stepping on yours.

ev-br commented 8 years ago

In your solution with preallocation: you can probably just have a second vector with row indices. The overhead is at most nrows, but there's no reallocation at all.

maciejkula commented 8 years ago

Yep, or have a resize call that allocates new vectors when necessary.

ev-br commented 8 years ago

https://github.com/ev-br/sparr/pull/40#issuecomment-208940574 for some numbers on constructing a Possoin 2D matrix from pysparse docs.

I've no idea why memory benchmarks report all zeros. From peakmem, fast_lil_mat actually looks good :-). Timewise, ll_mat wins hands down.

ev-br / sparr

add benchmarks #17