Qiskit / rustworkx

A high performance Python graph library implemented in Rust.
https://www.rustworkx.org
Apache License 2.0
1.03k stars 145 forks source link

Add manual iterator implementations for custom vec iterables #1107

Closed jakelishman closed 6 months ago

jakelishman commented 6 months ago

This adds manual Iterator implementors for the custom vec Iterable pyclasses, which were previously using the Python default iterator implementation that attempts calls to __getitem__ with successive integers until an IndexError is raised.

Doing this means that we no longer have any overhead from converting Python-space integer indices to Rust-space ones, because we just store a pointer to the backing vec internally and yield from that using an index we track ourselves.


Using the same setup code as discussed in #1096, with a manual iteration through afterwards:

import rustworkx as rx

N = 1_000_000
g = rx.PyGraph()
g.add_nodes_from(list(range(N)))
nodes = g.node_indexes()

%timeit for _ in nodes: pass

On main (which includes #1096), the timeit measured the iteration at ~78ms, and with this PR it dropped to ~48ms on my machine.

I didn't add all the new iterator structs to the typing information concretely, instead just referencing them opaquely as Iterator[T], because these objects only implement magic protocol methods and are not intended to be constructable / picklable / nameable from Python space at all (similar to how list_iterator isn't as a builtin). Open to changing that if preferred.

I'll mark this as:

but feel free to edit this to remove that link if you think there's more places to look for performance here.

coveralls commented 6 months ago

Pull Request Test Coverage Report for Build 7995769315

Details


Changes Missing Coverage Covered Lines Changed/Added Lines %
src/iterators.rs 75 93 80.65%
<!-- Total: 75 93 80.65% -->
Totals Coverage Status
Change from base Build 7982100393: -0.09%
Covered Lines: 16821
Relevant Lines: 17428

💛 - Coveralls
jakelishman commented 6 months ago

Sure, I can add those. Flicking quickly through the tests now, I suppose there's a question of whether we want to implement more efficient reversed iterators as well. That's generated separately to the sequence-protocol iterator, so what I've done here doesn't affect reversed(g.node_indices()) nor reversed(iter(g.node_indices()) (which continues to not work, which is valid behaviour).

I can throw in a reversed iterator for each of these if you like as well, but it equally might not be worth whatever increase in compilation times it comes with?

jakelishman commented 6 months ago

Actually: what kind of iter tests were you thinking? I deliberately made the actual iterator type unnameable from Python, so I can't really isinstance check it or anything like that, and there shouldn't be any real testable behaviour that differs from the implicit sequence-protocol iterator. Any test case that iterates through one of these nodes (which many existing tests do) is already calling iter on them, it's just implicit within CPython.

jakelishman commented 6 months ago

Release node added in 8f9a991.

mtreinish commented 6 months ago

Actually: what kind of iter tests were you thinking? I deliberately made the actual iterator type unnameable from Python, so I can't really isinstance check it or anything like that, and there shouldn't be any real testable behaviour that differs from the implicit sequence-protocol iterator. Any test case that iterates through one of these nodes (which many existing tests do) is already calling iter on them, it's just implicit within CPython.

I was just thinking of something that explicitly exercises it, mostly to say we did. I wasn't expecting a type check, I think the existing tests we have for this on mappings do something like:

test = iter(object)
result = list(test)
self.assertEqual(result, expected)

It might just be duplicate coverage as that implicitly happens in other tests, but I didn't see the harm in at least validating it explicitly

mtreinish commented 6 months ago

I can throw in a reversed iterator for each of these if you like as well, but it equally might not be worth whatever increase in compilation times it comes with?

That seems worth it to me, personally I'm not too worried about rustworkx compile times. It might slow down development and CI a little bit, but we're not really bottlenecked on that (if it came down to it we could leverage debug mode a lot more and tweak things to optimize more for build speed). But, making sure the library is higher performance for python users is in general a high priority, so having faster reversed() iterator implementations makes sense to me.

jakelishman commented 6 months ago

Ok, I've added the reversed iterators in 5f60d9e and explicit tests of both iter and reversed in b68aa5f. I tweaked the iter tests slightly so they compare list(x) == list(iter(x)), which doubles it as a test that the iterable is multiply consumable.

The timings of the reversed iterators are the same as the timings as the forwards iterators: this PR took the million-node NodeIndices iteration (with a reversed() around it) from about 76ms to about 47ms this time when I ran them.