Tribler / py-ipv8

Python implementation of Tribler's IPv8 p2p-networking layer
GNU Lesser General Public License v3.0
231 stars 46 forks source link

Remove RetryRequestCache immediately after removing a circuit #1044

Closed ichorid closed 2 years ago

ichorid commented 3 years ago

Observed during anonymization tests:

21-10-01 14:32:47,639 - TriblerTunnelCommunity-e5033a5c00f3c496f4b69a4bdfd96271e1dffcb1-TrueFalse - INFO - Got an exit socket 1665264179 UDPv4Address(ip='231.47.208.150', port=9696)
2021-10-01 14:32:47,640 - TriblerTunnelCommunity-c64a62f029bba22a25f2833b1897f2a32cbc503c-FalseTrue - INFO - Removed circuit 1665264179 leaving hidden swarm
2021-10-01 14:32:47,640 - TriblerTunnelCommunity-c64a62f029bba22a25f2833b1897f2a32cbc503c-FalseTrue - DEBUG - Got cell(3) from circuit 1665264179 (sender UDPv4Address(ip='182.86.106.35', port=50477), receiver Peer<231.47.208.150:9696, xkpi8Cm7oiol8oM7GJfyoyy8UDw=>)
2021-10-01 14:32:47,642 - TriblerTunnelCommunity-c64a62f029bba22a25f2833b1897f2a32cbc503c-FalseTrue - ERROR - Exception occurred while handling packet!
Traceback (most recent call last):
  File "/home/vader/my_SRC/TRIBLER/tribler_ichorid/src/pyipv8/ipv8/messaging/anonymization/community.py", line 647, in on_packet_from_circuit
    result = handler(source_address, data, circuit_id)
  File "/home/vader/my_SRC/TRIBLER/tribler_ichorid/src/pyipv8/ipv8/messaging/anonymization/community.py", line 42, in wrapper
    return func(self, source_address, payload, circuit_id)
  File "/home/vader/my_SRC/TRIBLER/tribler_ichorid/src/pyipv8/ipv8/messaging/anonymization/community.py", line 727, in on_created
    circuit = self.circuits[circuit_id]
KeyError: 1665264179

What happens is: remove_circuit removes circuit_id from self.circuits, while leaving the associated "retry" request_cache uncleaned. Then, on_created receives a message, sees the cache is still there and tries to get access it, resulting in the error above.

qstokkink commented 3 years ago

Seems like log output from this statement:

https://github.com/Tribler/py-ipv8/blob/72c304f34891a9e963c444f473ce4f2868cfd711/ipv8/messaging/anonymization/community.py#L650-L652

[Administration] This should not be crashing IPv8: lowering priority to "enhancement".

It would be better to log a warning in case of OP and return from on_created() if circuit_id not in self.circuits.

ichorid commented 3 years ago

Well, this is definitely a bug in Anonymization community, and since the code responsible for this is hosted in this repository, I report it here :wink:

qstokkink commented 3 years ago

@ichorid I'm not sure what the latter half of your comment is responding to. I never claimed that you are posting in the wrong repository.

I do not see the grounding in the former half of your comment stating "this is definitely a bug". All I see is a log statement that prints the abrupt termination of a message handler (at the correct point in the execution flow) and then logs this abnormal execution flow. I see that we can "enhance" this by describing the abnormal execution better in a log statement of lesser severity, which is why I labeled this issue as an "enhancement". Could you elaborate on your statement?

ichorid commented 3 years ago

I see that we can "enhance" this by describing the abnormal execution better in a log statement of lesser severity,

Well, as the IPv8 author you see an opportunity for enhancement of logging statements. As the maintainer of Tribler anonymity feature, I see a bug in the Anonymity community logic :shrug: I'm going to file a PR to address this at the level of Anonymity community logic.

qstokkink commented 3 years ago

You stated "this is definitely a bug in Anonymization community" and I asked you to elaborate on this statement because I do not agree and I gave you the reason why. You answered with "I see a bug in the Anonymity community". This is a repetition and it does not explain anything. Being a maintainer of Tribler anonymity doesn't address this lack of explanation in any way. You could be Her Majesty The Queen of England and it still would not explain why you see this as a bug. Please explain your point of view.

If you cannot qualify why this log statement should be anything but a log statement, please don't expect any PR to the most fragile internals of IPv8 to be anything but instantly rejected.

ichorid commented 3 years ago

Well, the original post above explained the bug ("What happens is..."). Let's elaborate:

  1. Anonymity community (AC) creates a circuit and add the associated cache entry.
  2. Some event runs remove_circuit, which (correctly) removes the circuit_id from self.circuits while (incorrectly) leaving the associated cache in place.
  3. The remote peer responds to the original circuit creation request and sends the CreatedPayload back (correctly)
  4. The local peer processes the payload (on_created), checks (correctly) if there exists the associated cache entry (it is still there) and proceeds to get the associated circuit from self.circuits dict.
  5. :boom: , there is not associated circuit entry (it was deleted as Step 2). :arrow_backward: ERROR HERE

https://github.com/Tribler/py-ipv8/blob/6dfdf64bd9bdabb768c21f6415c7a050d996ae56/ipv8/messaging/anonymization/community.py#L726-L728

p.s. regarding your :crown: of :gb: comment, I just pointed out that we have different views :eyes: of the subject: for you as the IPv8 maintainer, it is enough to ensure that IPv8 does not crash on random community bugs. Indeed, it does not crash in this case. While for me, as the (hopefully, temporary) AC maintainer, merely demoting an interpreter error to a warning is not enough - I must investigate every such case and solve the root problem :shrug:

qstokkink commented 3 years ago

Thank you. That gave me enough information to understand and reproduce this issue. Feel free to open a PR to resolve this.

As the cache that is left behind cleans itself in 10 seconds, there is no permanent damage to the TunnelCommunity (just the ugly log statement). Therefore, through the definitions of "bug" and "enhancement" used in the IPv8 repository (not some personal view), this issue qualifies as an "enhancement". I agree that it would be nicer to remove the cache immediately though.

Please use the following unit test to verify the new behavior:

Click me! ```patch diff --git a/ipv8/test/messaging/anonymization/test_community.py b/ipv8/test/messaging/anonymization/test_community.py index 5cc6a89..1387497 100644 --- a/ipv8/test/messaging/anonymization/test_community.py +++ b/ipv8/test/messaging/anonymization/test_community.py @@ -1,4 +1,5 @@ -from asyncio import Future +from asyncio import Future, ensure_future, iscoroutine +from functools import partial from unittest.mock import Mock from .mock import MockDHTProvider @@ -14,6 +15,20 @@ from ....messaging.interfaces.udp.endpoint import DomainAddress, UDPEndpoint from ....util import succeed +def _on_packet_fragile_cb(self, source_address, data, circuit_id): + """ + A fragile version of on_packet that crashes on message handling failures. + + These failures won't actually cause IPv8 to crash in production, but you should probably handle these. + + Add overlay classes to use in production mode to the ``production_overlay_classes`` list. + Filter nodes to run in production mode by overwriting ``TestBase.patch_overlays``. + """ + result = self.decode_map_private[data[22]](source_address, data, circuit_id) + if iscoroutine(result): + self.register_anonymous_task('on_packet_from_circuit', ensure_future(result)) + + class TestTunnelCommunity(TestBase): def setUp(self): @@ -45,6 +60,11 @@ class TestTunnelCommunity(TestBase): ipv8.overlay.dht_provider = MockDHTProvider(ipv8.overlay.my_peer) return ipv8 + def _patch_overlay(self, overlay): + super()._patch_overlay(overlay) + if overlay and overlay.__class__ not in self.production_overlay_classes: + overlay.on_packet_from_circuit = partial(_on_packet_fragile_cb, overlay) + def assert_no_more_tunnels(self): """ Utility method to check whether there are no more tunnels left @@ -108,6 +128,28 @@ class TestTunnelCommunity(TestBase): # Node 1 has an exit socket open self.assertEqual(len(self.overlay(1).exit_sockets), 1) + async def test_create_circuit_destruct_initializing(self): + """ + Check if a circuit is destructed and cleaned correctly while still initializing. + """ + self.settings(1).peer_flags.add(PEER_FLAG_EXIT_BT) + await self.introduce_nodes() + + # Let node 0 build a circuit of 1 hop with node 1 + # We immediately remove it, before a response can be received + initializing_circuit = self.overlay(0).create_circuit(1).circuit_id + self.overlay(0).remove_circuit(initializing_circuit, remove_now=True) + + # Let the circuit "creation" commence + await self.deliver_messages() + + # Node 0 should have removed its initializing circuit + self.assertNotIn(initializing_circuit, self.overlay(0).circuits) + # Node 0 should not have any outstanding caches + self.assertListEqual(self.overlay(0).request_cache.get_tasks(), []) + # Node 1 has an exit socket open + self.assertEqual(len(self.overlay(1).exit_sockets), 1) + async def test_create_circuit_no_exit(self): """ Check if 1 hop circuit creation fails without exit nodes. ```