get_walkable_addresses uses a considerable amount of CPU cycles

Tribler / py-ipv8

Python implementation of Tribler's IPv8 p2p-networking layer

GNU Lesser General Public License v3.0

231 stars 47 forks source link

get_walkable_addresses uses a considerable amount of CPU cycles #252

Closed devos50 closed 6 years ago

devos50 commented 6 years ago

To address issue #78, I enabled Yappi for the TrustChain crawler and monitored CPU usage over a period of a few hours. See the following report:

The get_walkable_addresses method takes a significant amount of processing power (38.5%!). A breakdown shows that almost all this time is spent in the <dictcomp> method, which in turn has the following performance:

This makes walking through the network expensive in terms of CPU usage. @qstokkink is there any easy way to make this more efficient?

qstokkink commented 6 years ago

Most of the hurt seems to actually be in the <dictcomp> in network.py:

https://github.com/Tribler/py-ipv8/blob/d89d7bb42e4647cf34f14966b9c3ae0bf1632f8a/ipv8/peerdiscovery/network.py#L151-L152

We could optimize this.

devos50 commented 6 years ago

I managed to reproduce this one in Gumby, using an overly aggressive walker (with walking interval 0.05 sec or 20x/sec). In this experiment with 500 nodes, 71.22% is spent on get_walkable_addresses method. Also, 23.78% of the time is spent in b64encode.

yappi_2

According to the performance graphs, it is clear that the cost of peer discovery increases over time (which probably correlates with the number of entries in the services_per_peer dictionary):

utimes

This should give us a baseline for optimising the network.py file.

qstokkink commented 6 years ago

I think we can just strip NetworkX out of network.py, nobody is using the actual graph anyway. Also, that saves a dependency.