DEpt-metagenom / containerized_genomics_tools

This repository hosts the files necessary to build and test containers for genomics tools
1 stars 2 forks source link

Testing #6

Open kovcsboti opened 1 week ago

kovcsboti commented 1 week ago

@gpetho In the makefile we run a testing, however the tool I'm trying to dockerize has built-in testing, with testing data. The dockerfile I used as a template runs the test in the final stage and verifies if the installation is correct.

Should the Makefile also contain a test or it's redundant in this case?

gpetho commented 1 week ago

I don't think there is any value in testing the built Docker container again in that case, so I would say no as far as that container is concerned. However, what we still need to test is the Apptainer container. I have read notes by people saying some Docker container was not compatible with Apptainer, or in some cases even with some specific versions of Apptainer (e.g. everything either before or after a specific Apptainer version number). So the fact that the application is installed correctly while the Docker container is being built, and that the Docker container works fine as well, does not guarantee that the Apptainer container built from the Docker container will work as well. It would be ideal if you could reuse the test data that are used for testing while the Docker image is being built for testing the Apptainer container. This obviously depends on what kind of test that is. Based on what you are writing, I assume that the test verifies the correct operation of the entire application as a whole on one or more input files, presumably based on comparison to reference files or reference strings specified within the test scripts. If this is indeed the case, I suppose the test procedure can be extracted from the Dockerfile (perhaps relying on the language models to help identify what needs to be done to achieve this exactly) and rerun outside of the docker build process after the Apptainer image has been built. On the other hand, if that testing means that a suite of unit tests is run, then that's not what we need.

kovcsboti commented 1 week ago

I don't think the test uses a reference, it's simply runs the tool with the basic parameters (as far as I can tell). However we could use the tests output as a reference, I downloaded the inputs as well, so it's possible to compare the built-in test output with a simple run output (using the same input).

gpetho commented 1 week ago

it's simply runs the tool with the basic parameters

So it just checks whether the tool crashes? In that case, that is not sufficient for us. We have seen cases where a Docker image seemed to run successfully and produced output, but that output was incomplete. PHASTEST in particular did this when the installation of the database had failed when the image was either built or first run. So the testing is supposed to make sure not just that the tool runs and exits without errors but also that the output is correct and complete. I agree that the workflow that you described is the right way to do it. Run the tool on the provided input data before the tool is put within the container, check whether the output appears complete and correct, and if yes, treat it as reference output, then rerun the tool in the Docker and then the Apptainer version on the same input data and check whether the output is identical to this reference.

gpetho commented 1 week ago

Question received from @kovcsboti by email (translated by LLM, accuracy checked by me):

I have a question: I’ve created the Dockerfile and the Makefile, and I also have the reference and input files for testing (I 'collected' these from different sources, and ChatGPT helped as well). How can I test them before submitting a pull request? Should I just run them? I saw in a conversation with the interns on GitHub that in the case of 'sra_toolkit,' they used a dummy Docker Hub. I’m sure this was mentioned before, but honestly, I don’t remember.

You can submit a pull request anytime when you think that you are ready with something and want somebody (probably either me or Levente) to take a look at it, so this is independent of the question how and when to test the containers that you have created.

I'm not sure I understand what you mean exactly, but basically testing your Makefile and Dockerfile consists of running all make targets one after the other (make build, make test, etc.) on some Linux machine and verifying manually whether they run as they should and produce the expected output. make test should test the Docker and Apptainer image, so that target covers that part of your question. To test whether make pull, make push and make build_apptainer work, I suppose the best option is to create an account for yourself on Docker Hub (it's free) and push the built Docker image there, and pull both images from there before testing them. When you have verified that everything works, let me know, and I'll adjust the variable that holds the Docker Hub repository's URL in the Makefile and push the image to the institute's Docker Hub.

kovcsboti commented 1 week ago

I suppose the best option is to create an account for yourself on Docker Hub (it's free) and push the built Docker image there, and pull both images from there before testing them. When you have verified that everything works, let me know, and I'll adjust the variable that holds the Docker Hub repository's URL in the Makefile and push the image to the institute's Docker Hub.

Yes that was the crux of my question. Basically, I have something that is seemingly close to finished, however without testing I can't be sure if it's actually works. But setting up a Docker account sounds like the best sollution. Thanks.