Closed qstokkink closed 3 years ago
If I understand the task correctly, nothing in Python should beat the O(1) speed of a dict lookup:
class Network(object):
def __init__(self):
self.verifyed_peers = set()
self.peers_by_key = dict()
self.peers_by_address = dict()
self.peers_by_service = defaultdict(set)
...
def add_verified_peer(self, peer):
with self.graph_lock:
...
self.verified_peers.add(peer)
self.peers_by_key[peer.public_key.key_to_bin()] = peer
for address in peer.addresses:
self.peers_by_address[address] = peer
for service_id in peer.services:
self.peers_by_service[service_id].add(peer)
def remove_peer(self, peer):
with self.graph_lock:
for service_id in peer.services:
self.peers_by_service[service_id].discard(peer)
for address in peer.addresses:
self.peers_by_address.pop(address)
self.peers_by_key.pop(peer.public_key.key_to_bin())
self.peers.discard(peer)
def get_verifyed_by_public_key_bin(self, public_key_bin):
with self.graph_lock:
return self.peers_by_key.get(public_key_bin)
def get_verifyed_by_address(self, address):
with self.graph_lock:
return self.peers_by_address.get(address)
def get_peers_for_service(self, service_id):
with self.graph_lock:
return self.peers_by_serivce.get(service_id).copy()
@kozlovsky I must disqualify that answer as you as you deviated from the question by changing the set
datastructure. But, in the interest of science, I changed the container
to a dict
. The results may be surprising (clickbait intended).
Here's your report:
Cheating notice: container is now a dict instead of a set. Answer does not qualify for the Snickers bar.
found = container[search]
Total time | 0.0577s |
Time per search | 0.577μs |
Speedup vs. for-loop | 24.9324 |
This surprised me. So, I ran this a few times, but the "map hijack" from OP is still faster1 than the dict lookup for the same input.
[1] There is some variance. There are runs where speed is comparable and there are runs when "map hijack" is more than 3 times faster than a dict lookup. Overall, the fastest result for the map was 0.141μs and the fastest for the dict lookup is 0.181μs, the slowest lookup for the map was 0.602μs and the slowest dict lookup was 0.701μs.
We have several instances of lookups by set hash in the
Network
class. These all look like this example:https://github.com/Tribler/py-ipv8/blob/e060a37a0ddad5f3c061305bb56ebb107c494340/ipv8/peerdiscovery/network.py#L202-L212
Even though for-loops are fast, we can (and probably should) optimize these lookups. Note that
set
objects don't have afind()
orget()
method. I'll document the three fastest lookups I found here.Each experiment uses 100000 repetitions with a set of 300 elements (all integers in
[0, 300)
). All code fragments operate of a given set (container
), to find an object (search
) that has a field (a
,search.a = randint(0, 299)
for each repetition) to match to the field of the set elements, if no match is found then the result (found
) should beNone
.for loop (current)
set operators
Note:
{search} & container
orcontainer & {search}
will not work. Python will give you the element from the smallest set back - our "fake" search object.map hijack
So far, the "map hijack" is the fastest. If you have a faster lookup method, please let me know 😃