[Tests] If exception raised while delivery, it hides from pytests

drew2a commented 3 years ago

Reported in https://github.com/Tribler/tribler/issues/5677

The following test describes and shows this behavior:

import json
from binascii import unhexlify

import pytest

from ipv8.community import Community
from ipv8.lazy_community import lazy_wrapper
from ipv8.test.base import TestBase
from tribler_core.modules.metadata_store.community.remote_query_community import RemoteSelectPayload

class TestCommunity(Community):
    community_id = unhexlify('3858429c966a4eaba10ec5cde93a70e35152ff50')

    def __init__(self, my_peer, endpoint, network):
        super().__init__(my_peer, endpoint, network)
        self.message_handled = False
        self.add_message_handler(RemoteSelectPayload, self.on_message)

    @lazy_wrapper(RemoteSelectPayload)
    async def on_message(self, peer, request):
        self.message_handled = True
        raise Exception

    def message(self, **kwargs):
        for p in self.get_peers():
            self.ez_send(p, RemoteSelectPayload(0, json.dumps(kwargs).encode('utf8')))

class TestTest(TestBase):
    def setUp(self):
        super(TestTest, self).setUp()
        self.initialize(TestCommunity, 2)

    async def test_test(self):
        community1 = self.nodes[0].overlay
        community2 = self.nodes[1].overlay

        assert not community2.message_handled

        # send message
        community1.message()

        # no Exception (but it should be raised)
        await self.deliver_messages(timeout=0.5)

        # ensure that method has been handled
        assert community2.message_handled

        # ensure that exception is raised while direct call
        with pytest.raises(Exception):
            community2.on_message(None, None)

qstokkink commented 3 years ago

The fact that no message handler can crash IPv8/Tribler is a production feature, not a bug. The only way to make IPv8 crash it to schedule calls outside of the control of IPv8 (which is something you should avoid as much as possible) - this has bit us time and time again in the past: the most famous one is the MarketCommunity packet of death (ask @devos50).

There are two resolutions to your issue:

Disable production safeties: this is not preferred, as you will be testing different code than is actually in production and you are depending on randomly stumbling into an Exception due to test randomness.
Cover all your Community lines in your unit tests exhaustively: this is preferred, but takes more time.

devos50 commented 3 years ago

True, that is definitely a feature and a very helpful one.

the most famous one is the MarketCommunity packet of death

Yup, that one definitely has a place in our hall of fame (or shame? 🤔 )! Some students managed to bring down a large part of our live network through a bug where they included an out-of-bounds integer timeout value in a packet. This packet was disseminated in the network using a simple TTL mechanism. When receiving this packet, a client would first forward the packet to neighbours and then evaluate the packet - which would crash Tribler. Turned out that I forgot an out-of-bound check on this timeout value...

To prevent such detrimental bugs from happening, we catch all errors produced when handling the message and log the traceback. We should, however, modify our testing framework to also fail when one of these errors occur. Maybe we could disable this behaviour in MockIPv8?

qstokkink commented 3 years ago

We should, however, modify our testing framework to also fail when one of these errors occur. Maybe we could disable this behaviour in MockIPv8?

It would be very easy to do this. However, I oppose this. This is a feature of the Community class and, if you rip it out, you are modifying the code under test.

In my opinion a test should test the code it claims to be testing and not test whether mocking code has certain mocked behavior. Naturally, there are exceptions when you would want a mock instead of the real code. However, I'm not seeing that exception to the rule here. Checking the effects of the code under test would be sufficient to resolve this issue.

drew2a commented 3 years ago

Tests should reveal bugs, not hide them.

The feature described above looks like a feature for production, but like a bug for testing.

@qstokkink in other words, you propose in tests like https://github.com/Tribler/tribler/blob/devel/src/tribler-core/tribler_core/modules/metadata_store/community/tests/test_remote_query_community.py#L280-L293 to test both the "send" method and the "receive" method explicitly?

Please note, that the test https://github.com/Tribler/tribler/blob/devel/src/tribler-core/tribler_core/modules/metadata_store/community/tests/test_remote_query_community.py#L280-L293 raise no exception (because it is a feature), but in reality, this behavior leads to the application crush: https://github.com/Tribler/tribler/issues/5677

qstokkink commented 3 years ago

Tests should reveal bugs, not hide them.

Tests should assert mismatches between expected and delivered functionality of production code. Bugs are the cause of this mismatch. I don't see how a test could actively hide a bug, unless the test is constructed very poorly.

@qstokkink in other words, you propose in tests like https://github.com/Tribler/tribler/blob/devel/src/tribler-core/tribler_core/modules/metadata_store/community/tests/test_remote_query_community.py#L280-L293 to test both the "send" method and the "receive" method explicitly?

No.

I propose the tests assert the functionality of your function calls. The only assert in the test you linked is assert True, this is an anti-pattern, see: http://www.everydayunittesting.com/2017/03/unit-testing-anti-pattern-not-asserting.html Just because the line is covered, doesn't mean the test is good.

For example, a valid assertion would be that an illegal query does not yield database results for the node sending the query.

Lastly, excessive mocking of the code under test, to keep track of individual "send" and "receive" calls, is also a testing anti-pattern (colorfully called the "anal probe").

Please note, that the test https://github.com/Tribler/tribler/blob/devel/src/tribler-core/tribler_core/modules/metadata_store/community/tests/test_remote_query_community.py#L280-L293 raise no exception (because it is a feature), but in reality, this behavior leads to the application crush: #5677

That occurred because of the lack of exception suppression, not its inclusion. The stacktrace of #5677 is not within a message handler, which is what O.P. is about. That is another discussion.

drew2a commented 3 years ago

I propose the tests assert the functionality of your function calls. The only assert in the test you linked is assert True, this is an anti-pattern, see: http://www.everydayunittesting.com/2017/03/unit-testing-anti-pattern-not-asserting.html

The article is about the situation when you have unit tests, integration tests, and end-to-end tests. In this case, if we write a unit test, that will pass when no exception — it is an antipattern.

But the ability that provides TestBase is more close to the integration tests, rather than to unit tests. So, nothing wrong to check the absence of an exception in that case.

Lastly, excessive mocking of the code under test, to keep track of individual "send" and "receive" calls, is also a testing anti-pattern (colorfully called the "anal probe").

Cool, that at least it is clear.

qstokkink commented 3 years ago

The article is about the situation when you have unit tests, integration tests, and end-to-end tests. In this case, if we write a unit test, that will pass when no exception — it is an antipattern.

I don't understand your point. We have different types of tests and unit tests should pass when there's no exception.

But the ability that provides TestBase is more close to the international tests, rather than to unit tests. So, nothing wrong to check the presence of an exception in that case.

Assuming you meant "integration" and not "international": I would agree that if there was a method inside of your custom community that functioned as a unit (a manager pattern for example) and raised an exception, this would be fine. However, the on the level of abstraction of Community interaction, crashing is unacceptable.

ichorid commented 3 years ago

I agree with @drew2a that tests must fail if there is any exception during execution. Could we disable IPv8 error suppression, at least in Tribler tests?

drew2a commented 3 years ago

@qstokkink yes, "integration" not "international" 🤦 (fixed). Also, change: "absence of an exception in that case" instead of "presence of an exception". Now it should be more clear.

drew2a commented 3 years ago

We have different types of tests and unit tests should pass when there's no exception

Yes, that is true. But mostly of Tribler's tests are not unit tests. I would say all the tests, inherited fromTestBase are not unit tests.

qstokkink commented 3 years ago

I would say all the tests, inherited from TestBase are not unit tests.

TestBase is just a wrapper around unittest.TestCase that provides deadlock detection (disabled for pytest) and mocking infrastructure for Community subclasses. The "fatness" of the tested units is up to the application using it. Even so, be it either fat unit tests or integration tests, both are supposed to assert at least some sort of state change when functionality is invoked.

I agree with @drew2a that tests must fail if there is any exception during execution. Could we disable IPv8 error suppression, at least in Tribler tests?

Yes. But disabling the IPv8 error supression still won't make the test of the OP fail.

I may have given you all false hope as to how much disabling the error supression solves. There is still much more. Consider this typical construction:

def test_a_thing():
    import asyncio
    def this_is_bad():
        raise Exception()
    asyncio.get_event_loop().call_soon(this_is_bad)

There is no IPv8 error suppression and the test still passes. This is equally true for unawaited coroutines: pytest happily eats the error, but if you use nosetests for test execution you will at least see RuntimeWarning: coroutine 'TestTest.test_a_thing.<locals>.this_is_bad' was never awaited. Enabling debug mode on the event loop will even show you a traceback, but still not fail.

AFAIK to truly figure out if an asynchronous definition without a known handle had an error you would need to hotpatch asyncio or add a custom event loop.

I don't know of any sane solution to this problem. If you can solve it I would happily supply you with the 3 lines of code needed to disable the error suppression. However, for now, I would bank on proper unit testing and typing.

EDIT: An earlier version of this post suggested you could detect call_soon with a RuntimeWarning, this is only for unawaited coroutines.

qstokkink commented 3 years ago

I spent some time prodding around and printing tracebacks today. I don't have anything that works just yet, but I think there's a chance to get this working without too much magic. I'll assign myself to this issue and give this some attention.

qstokkink commented 3 years ago

Who knows what unspeakable horror I am about to unleash on the codebase.. but I have an implementation 😃

qstokkink commented 3 years ago

By the way, I didn't solve the coroutine ... was never awaited problem. But, that was also not the cause of OP. So, I'll take the liberty of ignoring that problem.

Tribler / tribler

[Tests] If exception raised while delivery, it hides from pytests #5683