Simplify feature_no_esperanza_tx_relay_delay.py

AM5800 commented 5 years ago

This is another take on #875

Let me remind you how it all began:

2019-04-01 10:27:23.365000 TestFramework (INFO): Test outbound tx relay 5 times. mean: 2.28 sec, median: 1.04 sec
2019-04-01 10:27:40.739000 TestFramework (INFO): Test inbound tx relay 5 times. mean: 1.557 sec, median: 0.366 sec
2019-04-01 10:27:50.644000 TestFramework (INFO): Test outbound vote relay 5 times. mean: 0.212 sec, median: 0.217 sec
2019-04-01 10:27:57.047000 TestFramework (INFO): Test inbound vote relay 5 times. mean: 0.207 sec, median: 0.215 sec
2019-04-01 10:27:57.047000 TestFramework (ERROR): Assertion failed
Traceback (most recent call last):
  File "/home/travis/build/dtr-org/unit-e/build/unit-e-x86_64-unknown-linux-gnu/test/functional/test_framework/test_framework.py", line 156, in main
    self.run_test()
  File "/home/travis/build/dtr-org/unit-e/build/unit-e-x86_64-unknown-linux-gnu/test/functional/feature_no_esperanza_tx_relay_delay.py", line 193, in run_test
    assert median(inbound_vote_delays) < median(inbound_delays) / 3
AssertionError

full log

Our intent is to assert that votes are propagated faster than regular transactions. Previous approach was comparing them directly. But it ended up very unstable - sometimes diffusion contribution was very small, sometimes huge. The only right solution to make this stable - is to increase number of transactions we take into account. Current value(5) is just too small. But each extra transaction dramatically increases test running time.

So together with @kostyantyn we decided to take another path: We know that regular transactions are propagated with mean delay of 2(or 5) seconds. So we just assert that votes are faster than that. I've launched this test several times and the slowest vote I have seen on travis took 0.23 seconds. So I set threshold to 0.5 seconds.

This resulted in a much simpler test. And much faster one

Signed-off-by: Aleksandr Mikhailov aleksandr@thirdhash.com

frolosofsky commented 5 years ago

Regardless of the change, is this test a functional scenario? I can imagine that the previous version of the test was a sort of performance test, and, in my opinion, it could go to the cron job with an increased number of transactions.

What the function does the proposed version of the test check? Is 0.5sec a value we declare, rely on, and advertising? Frankly, this PR looks like a try to fit in specific Travis configuration.

I do not see how this test can be a functional in any version. Personally, I'd remove it from functional tests and run on a cron job. And I'm a bit unhappy having proposed version in any form.

kostyantyn commented 5 years ago

We have functionality that there is no propagation delay for finalizer commits and this test shows it. We can question if we need this test. My opinion is we should keep it but will be up to other opinions.

We used to run it on cron but then we kept breaking it and noticed it only after the PR was merged. Was quite annoying for the team so we decided to run it on every Travis build. I'd suggest to keep it in the test list and don't move to cron.

AM5800 commented 5 years ago

I agree with @kostyantyn . I would also like to add, that if we run previous version of this test, let's say, 2000 times - we will get 2(5) seconds mean for regular transactions. I am not sure we really need such comparison because we already know the result (but yes, we could have a test which ensures that diffusion works - but isn't it a different story?)

frolosofsky commented 5 years ago

I would also like to add, that if we run previous version of this test, let's say, 2000 times - we will get 2(5) seconds mean for regular transactions. I am not sure we really need such comparison because we already know the result (but yes, we could have a test which ensures that diffusion works - but isn't it a different story?)

It's a purpose of performance tests to show same numbers and their correlation, actually. Anyway it's neither NACK nor ACK, just my 2 cents.

scravy commented 5 years ago

we could have a test which ensures that diffusion works

Although this might be a different story, we should have such a test. I can guarantee you things will break, for example when merging with bitcoin 0.17. 0.18. Anything.

dtr-org / unit-e

Simplify feature_no_esperanza_tx_relay_delay.py #878