Improve development by modularizing the tests

martin-schulze-vireso commented 5 years ago

Currently the tests are a large monolith which makes their development cumbersome. They should be separated into several lightweight suites that can be called independently.

Context

Taken from https://github.com/tomav/docker-mailserver/pull/1185#issuecomment-517947172

[x] separate tests into smaller, independent chunks (e.g. one suite per container)
[x] ensure cleanup really does clean up
[x] replace sleeps with health checks and timeouts
[x] add a simple test that ensures the startup of the service was successful before running the testsuite
[x] ensure that test containers are stopped automatically after their test has run
[x] fix non deterministic tests (where restarting does change the result)
[x] reduce interference between containers by separating their config folders

martin-schulze-vireso commented 5 years ago

I think the separation into a test suite per container would allow for the setup/teardown functionality of BATS to take that over from the Makefile. However, we still should do the common setup at a central place.

erik-wramner commented 5 years ago

See PR #955 (unfortunately a bit outdated and never fully completed).

fbartels commented 5 years ago

would allow for the setup/teardown functionality of BATS to take that over from the Makefile

That sounds like an interesting proposition. Although the Makefile could the still be used for proper orchestration of tasks.

martin-schulze-vireso commented 5 years ago

@erik-wramner Thanks, I will have a look at that.

@fbartels

the Makefile could the still be used for proper orchestration of tasks.

Yeah, I'd envision something like a pattern rule for .bats files in the test folder, which would allow to run specific tests via make test/<suite-name>.bats and the general test target depends on all test/*.bats

The biggest problem that I see right now is that setup and teardown are called per test case, that might be prohibitively slow if we start and stop/remove the container each time. It may be workable if we get the sleep times down as much as possible but I am afraid we will only find out by trying.

martin-schulze-vireso commented 5 years ago

@erik-wramner After skimming through the diff and rebasing that onto current master I am a bit at a loss: There are close to 200 tests in the project, the PR seems to be mostly WIP. Since there never was a green state I am not sure if that is the right starting point. I might get some inspiration there but it would probably be much more helpful if @17Halbe could give me some advice on the pitfalls that were encountered so I don't make the same mistakes again. However, since it has been so long since this was done most of that knowledge might already be lost.

The main takeaway from the PR for me is: Relying on the (sleep) timing is very brittle, but getting rid of that is already one of the main goals of this ticket.

erik-wramner commented 5 years ago

Right. I think it also underscores the importance of trying to do this in reasonably small steps, otherwise we may end up with another big PR that is never completed if you run out of time for some reason.

fbartels commented 5 years ago

In https://github.com/zokradonh/kopano-docker I am relying on waiting for docker-mailserver to become active on port 25 (which can take a while, since it generates a dhparam file on first startup) before actually trying to make use of it.

I am wondering if it would in addition make sense to move test execution into Docker containers. I am currently rerunning tests on my system to figure out a test failure and see the tests modifying files in the repo that I constantly have to overwrite with their last state from the repo.

erik-wramner commented 5 years ago

That would be nice because it would hopefully make it possible to run the tests on Windows and MacOS as well. Currently they can only run successfully on Linux hosts. I'm not sure how much work it would be, though.

fbartels commented 5 years ago

Another test suite ticket with some suggestions: https://github.com/tomav/docker-mailserver/issues/324

martin-schulze-vireso commented 5 years ago

And another one #827 which deals with using docker-compose for the container setup.

martin-schulze-vireso commented 5 years ago

@erik-wramner

it also underscores the importance of trying to do this in reasonably small steps

Exactly, the other two tickets also seem to show the same problem: They got stuck after some time because the reporter did not have time anymore.

@fbartels

I am relying on waiting for docker-mailserver to become active on port 25

This is what I was planning to do. We probably also need something to wait for emails to be delivered for the tests to work. Currently I am thinking about an extension to BATS which would add something like a RUN_UNTIL_SUCCESS_OR_TIMEOUT which repeats a command until it succeeds or the timeout stops it. Some of the email delivery tests currently use sleeps to ensure the email has been process.

(which can take a while, since it generates a dhparam file on first startup)

Strange, the docker build of the Dockerfile also does generate some dhparam files?

I think the best course for this ticket would be to separate a small set of tests into its own file and see what needs to be done to get it green and move the container run into setup/teardown. Then we can measure and see if it is a viable scheme to go forward with the rest of the tests.

erik-wramner commented 5 years ago

We would need to wait for at least:

Change detector script has run since last change (see existing code that I added recently)
Postfix up first time or again (change detector restarts the server)
Dovecot up first time or again (also restarted by change detector)

Probably a combination of log parsing and checking network sockets could work. For processing mail we could perhaps also parse logs.

There are several dhparam files, both postfix and dovecot has them. However, they should only be generated on first run and at build time. We could provide them in the test container, but then we won't test the dhparam generation code.

fbartels commented 5 years ago

There are several dhparam files, both postfix and dovecot has them. However, they should only be generated on first run and at build time.

yes, this was also on of the todos I have written down so far. While there is no (big) harm in having a public dhparam file I would rather see this generated at startup to have a per machine unique one. also having multiple dhparam does not really do much good (there is even a cron job to rotate dhparam weekly). I think this should be changed to only create dhparam at start (if its not there already) and just create one that is used by all components.

For the test case it could indeed be a possibility to have a pre generated one and copying/mounting this into the image before the start.

martin-schulze-vireso commented 5 years ago

Sharing some insight from the WIP PR above:

After the docker build there seems to be no more time consuming steps in starting the container (at least with the environment settings in the test). So the dhparam stuff seems to be only in the build process?
the timeouts will drastically improve the speed of the succeeding path but won't help much if there are errors. The test if startup was successful should probably be an early out for each test to avoid very long runs if something basic breaks.
The Makefile provides a very easy way to run all or specific test suites
If the performance of starting/removing the container per test case is too bad, we can do it per test suite via https://github.com/sstephenson/bats/issues/3#issuecomment-382664336

martin-schulze-vireso commented 5 years ago

So I started more wide ranging work in this in #1217 and already hit some ugly problem: The solution from the comment above for running setup/teardown once is only for once per running bats, which means the bats test/*bats will not run it for each file but only once for all files. I worked around that but a good solution would need support from bats-core itself (see https://github.com/bats-core/bats-core/issues/39).

@erik-wramner @fbartels Please have a look at #1217 and tell me if the current state is still good enough to proceed or if we need to sort out the problem within bats-core first.

erik-wramner commented 5 years ago

I think the workaround is good enough, getting support from bats-core can take a lot of time given that there is no clear consensus in that project as to how it should work! @fbartels?

One thing that was mentioned in one of the other test issues is that it would be nice to have some common tests that run in all containers; for example simple successful mail delivery. Now that you are extracting tests to separate files, would it be possible to define a file with common test cases that is included by all the test files and that uses the container under test as target?

martin-schulze-vireso commented 5 years ago

would it be possible to define a file with common test cases that is included by all the test files and that uses the container under test as target?

I am not sure. There is the load functionality but I don't know if it can pull in tests. I guess we can only try that.

martin-schulze-vireso commented 5 years ago

@erik-wramner Okay, I made a simple test where an empty file loads a file with a test inside and then run the empty file. Result: 0 tests detected.

This is unsuprising since bats uses a handwritten preprocessor to parse the bats file's @tests into another file where they are normal bash functions of a known name which then get called.

This leaves us with the question: can we still inject common code into all files? We could still write a suite of common tests as ordinary functions, tangle them together in one function that runs them all and let all .bats define a test that calls this aggregation function.

Pros: Only two additional lines (load 'common-sending-test' and @test "common sending test" { aggregation_function container name). It won't get much smaller.

Cons: There would literally be only one test per file for a whole slew of potential tests to run.

I think this is still a workable solution as long as we get good stack traces when errors occur.

erik-wramner commented 5 years ago

I agree, it is far better than duplicating the code and running the same basic tests in all containers can protect against stupid mistakes.

erik-wramner commented 5 years ago

@martin-schulze-vireso after merging the PR we get "not ok 1 checking default relay host: default relay host is added to main.cf" from time to time. It is not consistent. I get it locally as well about once in three builds.

Probably the "container is ready" test needs to be a bit more conservative in this case.

martin-schulze-vireso commented 5 years ago

Unfortunately, I can't reproduce this locally. I did 20+ runs (make test/default_relay_host.bats) without failure. Even cutting down the timeout repeat wait time did not change this. The first commit in #1238 adds some more diagnostic output in case of errors, can you rerun with this?

What is your environment? Is there any more error output in this case? (Maybe we should open another ticket for this?)

erik-wramner commented 5 years ago

I get it in several places, but not every time:

This project's Travis CI
My own Travis CI (builds my branches)
Locally, reasonably modern Linux server running Mint based on Ubuntu 18.04

Unfortunately I tend to rebuild in Travis to get a green build (this is not normally what I want to test) so I'm not sure if I have left an example. I'll let you know when that happens though there is not much to see. I'll try to run your branch locally when I get time.

I'll open a new ticket next time I can reproduce it.

erik-wramner commented 4 years ago

See #1264, happens every now and then.

martin-schulze-vireso commented 4 years ago

The last PR gave me some headaches.

The amavis sometimes hangs in bootup which leads to errors (as can be seen in the merge commit travis run). I don't have a hypothesis why this is happening yet.

erik-wramner commented 4 years ago

Actually the error in the merge commit Travis run is quite interesting. Most of these errors only happen when we run the tests, but there are at least two users who have reported that issue for real.

georglauterbach commented 4 years ago

What happened in the meantime?

martin-schulze-vireso commented 4 years ago

What happened in the meantime?

Nothing from my part. I got stuck with PRs in bats-core.

georglauterbach commented 4 years ago

I understand. Anything that can be done about this?

martin-schulze-vireso commented 4 years ago

Depends on what you mean by this.

It has been a while since I looked here so I don't know if the issues discussed here still persist.

georglauterbach commented 4 years ago

I'm here for the same reason, finding out whether this is still relevant. Let's see if someone else knows more.

martin-schulze-vireso commented 4 years ago

I took a deeper look now and at least the checkboxes in the issue description are still valid.

We have a large tests.bats which contains countless tests that should not be lumped together
the setup process for the containers is not done per test (file) which might lead to interference between tests
sleeps are everywhere to deal with startup latency issues. However, the timing depends on the CI's load which makes it either unreliable or slow

I did not yet look into the specific issue discussed it the Erik's comment but I think this issue still is very relevant.

georglauterbach commented 4 years ago

This analysis is worth a lot. We now know where we stand again. Thanks!

Now, let's face it: I just worked on #1601 and #1605, which took a great amount of time. I now have to invest it somewhere else. Therefore, we need someone or some people who take care of the involved sub-problems here.

martin-schulze-vireso commented 4 years ago

Therefore, we need someone or some people who take care of the involved sub-problems here.

Well, my original plan was to just get setup/teardown_file functionality landed in bats-core and come back here to continue. In theory that is achieved but it was a large rewrite that introduced kinks which I am currently working to iron out.

There are still some small steps towards test modularization, which would ultimately do away with starting the containers from the Makefile, that can be made with the current bats version. I'll see if I can put some time into that in the next week.

georglauterbach commented 4 years ago

@martin-schulze-vireso That's great news! We'll prioritize this issue and I'll pin it. Thanks for the work.

martin-schulze-vireso commented 4 years ago

Just to make it clear: There is still a lot to do and what I proposed above is only a small part of that. So I don't mind others helping. :)

georglauterbach commented 4 years ago

That's why I pinned it:) Others hwho are helping are always welcome!

martin-schulze-vireso commented 3 years ago

@aendeavor I am not sure if everything from the OP is finished.

georglauterbach commented 3 years ago

@martin-schulze-vireso was

[ ] replace sleeps with health checks and timeouts

fixed by your PR?

martin-schulze-vireso commented 3 years ago

Partially, there are still 31 direct sleep commands in *.bats files that should probably be replaced by polling+timeouts.

However, we could also wait how it plays out now. The main goal of removing the sleeps is to improve performance and test reliability at the same time. The sleeps have to be very high to account for slow CI systems which in turn slows down faster systems unnecessarily. Just to compare: The currently active sleeps introduce on average bewteen 5 and 10 seconds of wait time, so you got a total of 2 to 5 minutes of wait time. I don't know how much of that can be cut off though.

georglauterbach commented 3 years ago

Yeah, we should wait. Premature optimization the root of all evil :)

We'll come back to this issue this way or the other.

martin-schulze-vireso commented 3 years ago

I'd not see this as premature optimization. Cutting 5 minutes off a 20 minutes run would be a substantial improvement, even more so on faster systems, where the 20 minutes might only be 10 but the sleeps remain at 5.

Nevertheless, correctness would be the more pressing issue while speed is nice to have.

georglauterbach commented 3 years ago

That was just a quote from Donald Knuth because as you said,

we could also wait how it plays out now

I'm all for substantial improvements. But having seen two or three test runs with real PRs or other commits might be a good idea as well. However, I'm all for getting rid of sleeps.

martin-schulze-vireso commented 3 years ago

So the latest merge's tests (https://travis-ci.org/github/tomav/docker-mailserver/builds/740672232) failed in test 275 checking quota: warn message received when quota exceeded. It obviously did so during my refactoring, because I added debug output that comes in handy right now:

$ docker exec mail ls -l '/var/mail/otherdomain.tld/quotauser/new/'
-rw-r--r-- 1 docker docker 9855 Nov  1 20:22 1604262175.M451140P10072.mail.my-domain.com,S=9855,W=9900
-rw-r--r-- 1 docker docker 9855 Nov  1 20:22 1604262175.M459077P10073.mail.my-domain.com,S=9855,W=9900
-rw-r--r-- 1 docker docker   87 Nov  1 20:22 1604262175.M473195P10076.mail.my-domain.com,S=87,W=91

It looks like the 10k quota was ignored and we got the mail delivered twice?

georglauterbach commented 3 years ago

Clever idea integrating some debug messages!

I would like to help here, but truth be told, I haven't the faintest idea...

georglauterbach commented 3 years ago

@martin-schulze-vireso I will push a commit shortly which provides the shellcheck integration in test/linting/lint.sh for all files test/*.bats. ~~This is commented, but~~ if you'd like to play with it, you can now do so while using the production-ready implementation.

I will provide the commit (id) here: 2f840d7da5ae65a2be58e64f9ac772db4c44490e

martin-schulze-vireso commented 3 years ago

In https://travis-ci.org/github/tomav/docker-mailserver/builds/741619130 test 78 failed on line 68 here:

https://github.com/tomav/docker-mailserver/blob/9ce719213bf0cffae4542c13c97d5a957903c332/test/mail_smtponly.bats#L61-L70

The test looks a bit wonky to me: Doesn't the reload trigger a short downtime on the smtp port?

georglauterbach commented 3 years ago

I've had issues with this test locally too, and can therefore confirm that this test is indeed wonky. Can you do something about that?

martin-schulze-vireso commented 3 years ago

Well, we could put a wait for SMTP port before line 66, since nc obviously says it is down.

georglauterbach commented 3 years ago

Will 5 Seconds be sufficient, or do we need more?

EDIT: I provided a PR (#1682) which provides 5 seconds of sleep

martin-schulze-vireso commented 3 years ago

I meant: we should use the wait_for_smtp_port_in_container function. The timeout should be automatic.

docker-mailserver / docker-mailserver

Improve development by modularizing the tests #1206

Context