BioDocker Future Ideas and Implementations: testing of containers

pcm32 commented 8 years ago

I thought I would generate a derived thread from the one initiated by @ypriverol on #59, dedicated essentially to container testing here. In the context of a project I'm working, we are generating some proposals on how to test containerised tools. It would be great to have the thoughts and ideas of the biodocker community on this.

I agree with what it is said on #34 regarding testing being time/resource consuming, yet I think it would be good for biodocker/biocontainers to at least have guidelines on how to test containers.

bgruening commented 7 years ago

I know my view on this is a little bit radical but FWIW I do not think we should recreate our own tests to just test Docker containers. Docker is just an other deployment and packaging format - one in so many. We should take what is there and reuse for example the Debian Med testing experience and cases, the one that is defined in conda packages etc.

The mulled generated Docker containers are reusing the Conda tests and are running (currently only one) tests after the Container is assembled. This is completely automatized and can be extended as we which.

But this all is only testing the installation, we trust the unit-tests from upstream at this point and do not test results and correctness - this one the other hand is done further down the road for example in Galaxy tools where we compare the output of Docker containers and try to find regressions.

ypriverol commented 7 years ago

@bgruening @pcm32 @BioContainers/contributors About testing. We have been thinking about testing for a while. I think like software development they most be different levels of testing which different complexity depending on the type of software etc. We can't cover all the use cases but we can define best practices and how each container can be tested. In my opinion we should define a functional three main categories of tests:

functional container (mandatory): This will basically test if the container works. We can decide what command is we where discussing about docker run docker-image --version. This test basically check if the container is usable.
functional software (optional): This basically will test the software inside the container and it can be one test or multiple tests. We can try to implement in the future kind of ranking systems for those containers that are more tested etc.
fully-tested software (optional): This are the containers that provides a test for every single command line option and all the tests and examples are well documented in BioContainers web. Hope we can promote more of these tests but this is even difficult within the bioinformatics community and software engineer in general.

what do you think! @sauloal how we can implement this in the specification. BTW names can be changed.

timosachsenberg commented 7 years ago

I like the distinction between level. The definition of "fully-tested software" is a bit hard to achieve for larger frameworks like OpenMS. Just a suggestion for multi-tool containers: "the container qualifies as fully-tested if at least one test exists for every tool in the container.". Of course one could use the more stringent definition but this might result in e.g. thousands of additional tests that probably don't make real sense (i.e., it would be more sensible to test certain combinations of parameters instead each single one) if one would like to achieve the "fully-tested" level. Regarding test documentation - would it be sufficient to link to the source code? Otherwise, this would mean we would need to add documentation for each and every existing test (and we want to actually reuse existing ones). I know these are mainly "big framework" problems and probably affects the minority of containers so Idon't have a too strong opinion on this.

ypriverol commented 7 years ago

@timosachsenberg I really think testing is difficult to define. For that reason I prefer something like levels of testing and a more practical approach: functional container: should be easy to implement. functional software: should be one software parameter test (probably including the data). fully-tested: should be all parameters for the tool. Here I would like to point that orchestration systems, Workflow frameworks like OpenMS, Galaxy, etc should be think in a different way and would be always difficult to test all the functionalities. In those cases we can think about How to expose your internal tests.

sauloal commented 7 years ago

Functional container is what @bgruening has. a simple command (such as --version). For me, this is our only obligation. Anything else has to be done by users and bug reports.

Functional software and fully-tested, we could think of a separate project with extremely large infrastructure mimicking the Phoronix suit which is used to test the linux kernel. too much for me/us in my opnion.

ypriverol commented 7 years ago

Then we all agree in the functional container then we should make them mandatory for all the containers and if is posible to integrate them into BioContainers specification.

The second test functional software I would like to have at least one unit test per software to see that it works. --version for example can work and an non of the single functions of the software will pass because a crucial dependency can be missing.

The fully-tested can be for single containers when the developer and maintainer wants to guaranty that all functionalities are working let say a single container that convert one file to another. Probably the maintainer would like to provide a set of tests cases to fully test the container. This will enable the best practice guidelines specially for single containers. Agree that more complex pipelines should be tested using other environments.

julianu commented 7 years ago

I also agree with @sauloal that a simple test for the container should be sufficient. The software running in the container should obviously be tested as well. But that can happen on the software side itself with CI or something. And of course bug-reports etc.

One easy to implement test though might be the running of a "default case", which would be the same like one unit test maybe, as suggested by @ypriverol.

sneumann commented 7 years ago

Hi, discussions here are already very advanced, I was looking for a much simpler answer: which framework one could use for testing. Simplest would be to have a set of docker run ... together with known output, and wrap into e.g. https://github.com/kward/shunit2 (see also https://code.tutsplus.com/tutorials/test-driving-shell-scripts--net-31487) so that Jenkins can pick up the results. Does that make sense, or do we need to finish the testing scenarios first ? Yours, Steffen

pcm32 commented 7 years ago

Yes, what I was aiming with the question is more on the lines on what @sneumann mentions, at the level of implementation, how would you execute the tests. Our proposals were on that line, any opinions?

Certainly interesting to take the ideas that @ypriverol and others have developed here on the level of the testing (functional container, fucntional software, fully tested software).

Again, the idea is not to force on anyone to do the tests, but when someone asks how to do them, we could have good implementation guidelines in place.

bgruening commented 7 years ago

I'm in favor of simply running some docker run commands to see if the executable is actually executable :) In involucro we do it this way:

./involucro -v=2 -set TEST='moca --help' -set PACKAGE='moca' -set TAG='0.3.3--np111py27_0' -set VERSION='0.3.3' -set BUILD='np111py27_0' all

The test is a simple moca --help. If you want to have more tests these needs to be defined in the upstream package manager, but in general we trust the package manager that the tool is working. I would stay away from testing functional software. This is a huge effort and others doing it already more downstream the tool life cycle.

xref: https://github.com/BioContainers/specs/issues/69

BioContainers / specs

BioDocker Future Ideas and Implementations: testing of containers #60