duritong / trocla

A password store for password creation and retrieval
Other
75 stars 18 forks source link

test suite fails 1/278 of the time #55

Closed sanvila closed 2 years ago

sanvila commented 5 years ago

Hi. While building trocla on Debian I found that the test suite fails sometimes:

expected "jIFVPdkL0dAKYWn1mvy9JwaStFCkc3mX" to match /[+%\/@=\?_.,:]+/

I estimate that this happens 0.359% of the time, or 1 time every 278 tries. Original bug report here:

https://bugs.debian.org/854714

Is the test suite buggy, or maybe the program is failing to meet its own specs?

Thanks.

duritong commented 5 years ago

It is a simple matter of probability. For the sake of reproducibility I would disable this test.

anarcat commented 2 years ago

do you have any suggestion on how to disable that specific testt? it's kind of deep inside the spec...

anarcat commented 2 years ago

i'll also point at a comment in the Debian BTS:

Hi. I've noticed this is still happening in buster, and contrary to what I suggested in the initial bug report, the number of CPUs have absolutely nothing to do with this. To reproduce, just build the package a lot of times. I've added some more build logs here:

https://people.debian.org/~sanvila/build-logs/trocla/

If a failing test does not mean that the package is buggy and does not follow its own specs, then the failing test should be disabled. The attached patch disables the failing test.

OTOH, if a failing test means the package is buggy but the package is useful anyway, then there is no point in making the build unsuccessful when the test fails. In this case it would be better to disable the failing test as well.

[ There is a third option, that the failing test means the package is buggy enough not to be trusted. If this is the case, the logical thing would be not to release buster with this package ].

I tend to agree here: if this test fails randomly, isn't there a problem with the original code, or with the test's design? couldn't some monkeypatching improve the behavior of the test here?

duritong commented 2 years ago

The test is kind of stupid. It tests whether a random generated string based on a set of characters with some special characters does include any of the special characters. However, as we all might know a thing about probability: it is totally reasonable that a random string might not consist of any of the special characters. It might very well just consist of 'aaa' or 'bbb' and even this 20 times in a row.

I had added the test in the early days to somehow verify that passing down the required set of characters works. It was a naive implementation, as I already knew math a quite well then (maybe even better than now), but it worked on my machine and it proofed that the random generator included the set of characters with the special characters. Obviously, not if you continuously execute that test.

What shall we do? Can we somehow test that we are using the expected set of characters without having to hope to beat probability? Or should we just silently drop it? Happy to merge any suggestions.

anarcat commented 2 years ago

On 2022-05-03 20:43:07, duritong wrote:

What shall we do? Can we somehow test that we are using the expected set of characters without having to hope to beat probability?

I think that, for testing purposes, the RNG could be seeded in such a way that you control the output. What you are testing here is that function A takes over function B, basically. You don't actually need the output to be random. I haven't looked at the tested code, but maybe decoupling things could help? I don't know.

Or should we just silently drop it? Happy to merge any suggestions.

I think that having the test somewhat skippable would help. I remember people in the #debian-ruby IRC channel having a good idea on this, but I lost the backlog. I'll see if I can recover it from the collective consciousness.

a.

-- Vivre tous simplement pour que tous puissent simplement vivre.

duritong commented 2 years ago

I guess returning a fix string from the random function in that test case is all we want, since we are testing that it is called with the right profile (and thus the right charset) by mocking only that particular function call

anarcat commented 2 years ago

awesome, that's pretty much what i had in mind. i think. :)