CheckLiveTickets is hilariously unoptimized

jrick commented 7 years ago

It should at the very least search for each ticket concurrently. An even better algorithm would search the entire data structure once for all of the hashes and merge the results.

jpz commented 7 years ago

Hi, I had a look at this to do some analysis.

You have to walk the binary tree (treap) for each of the tickets, which cannot be done concurrently.

What is the the severity of the performance impact?

An O(1) per ticket solution, only to be done if really worth introducing the complexity, would be to maintain a hashtable together with the ordered treap, as a secondary index to support O(1) by key retrieval. I presume the treap is used because there is a reason an in-order representation is preferable or required. If my presumption is incorrect, the other solution is be to simply replace the treap with hashtable.

The reason I come to this is by looking at the other approaches:

Let's take the parameters:

Q - number of query tickets
T - number of live tickets

Presently the algorithms looks like it is O(Q ln T) - Q binary-tree queries of O(ln T)

Alternatively, if you collate all the keys into a hashset at the time of the query - that is an O(T) operation; thereafter query Q times against that intermediate product; the total being O(a.Q + b.T). However if Q is a very low number, Q=1 for instance, and T is large, this does not look like an efficient algorithm under these conditions.

Lastly to brainstorm the only other approach I can think of, there is the approach of walking the tree, and comparing Q times against each node's key, which is also doesn't look like a good approach with order O(Q.T)

davecgh commented 7 years ago

I don't see why it couldn't be done concurrently. It's an immutable treap, so you should be able to do each lookup in a separate goroutine without any issues. That said, I'd like to see the specific data set that prompted this issue so we can get some good profiling information. I just looked based on the information provided and it looks to me like it's an n * log(n) operation. Due to that, I'd be willing to bet profiling would show that overheard of launching goroutines and collating the results wouldn't be any faster than the current code.

Given it's effectively a binary tree, you can't really do much in terms of "searching the entire data structure once". As mentioned, doing it any faster would require basically maintaining a separate map of live tickets alongside the treap. It probably wouldn't be horrible in terms of memory usage since the target live tickets is 40960 (on mainnnet) and thus it would only take around ~320k (plus map overhead) memory to maintain it. However, there is additional complexity to contend with and worse, it would probably need to go in the critical path that is latency sensitive. It might be worthwhile to compare building the map on the fly in CheckLiveTickets based on a single iteration of the treap and then the hashes looked up against it (perhaps only when len(hashes) > 50, or some value determined through a bit of empirical testing).

As to the ordering question, the live tickets must be maintained ordered because the lottery process has to be able to reproduce the same vote selection for consensus on all nodes. The alternative would be to maintain then unsorted, and then sort before the selection, but that is in the vote path, and would be quite a bit slower from the moment a new block came in and the vote notifications could be sent, which would be bad given how latency sensitive votes are.

jpz commented 7 years ago

Since commenting, I was reading more of your original check-in comments @davecgh - I just mention as I'm sure it's nice to know people take the time to read them!

My use of the word concurrent was not in terms of parallelisation on multiple cores - I do agree it's a embarrassingly parallel problem, and parallelisation is possible.

By the way, if I use hashtable in preference to map in discussion here, it is just that maps are trees in C++, and maps are hashtables in golang, and java, they are just interfaces, and so the word seems to describe more the interface than the algorithmic data structure.

Anyway, I see a problem in any case. The biggest complication in adding a secondary hashtable index like this would in preserving the O(1) lockless performance characteristics of the treap, as outlined in the comments here: https://github.com/decred/dcrd/blob/e6c2a627e84647c8be571f2bf7982b7167a8c545/blockchain/stake/internal/tickettreap/immutable.go#L24-L39

Any any trivial hashtable (e.g. golang map) implementation would need that hashtable to be copied and updated at a potential O(n) cost if a new version was created, which would significantly impact insert/delete costs, as well as potential memory costs.

I looked around, and I did find a lockless, snapshotting map-type datastructure that may be around similar to the treap - I just mention for completeness - the "ctrie". However, to achieve these characteristics, look up drops back to O(ln N) which is no better than the treap. (https://github.com/Workiva/go-datastructures#ctrie and http://lampwww.epfl.ch/~prokopec/ctries-snapshot.pdf)

To conclude, I think we need a real life case, and to profile it. I fear I'm wasting people's time getting overly academic here, but I thought the ticket worth discussing in any case... it's the kind of tuning I enjoy looking at.

chappjc commented 7 years ago

. I fear I'm wasting people's time getting overly academic here, but I thought the ticket worth discussing in any case... it's the kind of tuning I enjoy looking at.

Me too. I'm enjoying this discussion. Not a waste of time to me.

davecgh commented 4 years ago

Discussed with @jrick via another platform and this can be closed per older comments.

decred / dcrd

CheckLiveTickets is hilariously unoptimized #681