Open gpetho opened 1 month ago
I could pull and run the container with both Docker and Apptainer without any problems.
The repository also contains the .sif
file, which is unusual. I do not really understand the purpose of Apptainer.def
. The already existing image could simply be pulled from DockerHub without needing this definition file and without sudo permissions. I would change apptainer build $(SIF_FILE) $(APPTAINER_DEF)
to apptainer pull $(SIF_FILE) $(DOCKERHUB)
with DOCKERHUB
set to docker://deptmetagenom/fastp:0.23.4
in the Makefile
.
It could also be also convenient later to set the latest tag in DockerHub so that the latest image is always pulled by default and the tag (version) does not have to be set when the image is pulled with Apptainer. At the moment, of course, this is not really a problem as there is only one version in the DockerHub repository, but it might be a practice we should take up.
When running the container shell, fastp is not available in the $PATH
, which could be a minor inconvenience. The entry point is set correctly, so we should remember to use fastp with this container by running docker run fastp
or apptainer run fastp.sif
, or start fastp
by specifying its path when using apptainer exec
(i.e. apptainer exec fastp.sif /app/fastp
). I am describing this because although setting the entry point is the correct way, it is not always followed and fastp not being available in the$PATH
could lead to some confusion.
I am just asking out of curiosity: is there a particular reason why you used Alpine as the base image? Are there any advantages to using this base aside from it being very lightweight?
Thank you for reviewing this. So basically there is nothing else needed, we can proceed
The repository also contains the
.sif
file, which is unusual.
Yes, well, it's small, so it doesn't hurt to put it in the repository. I'm keeping it there and will be putting the .sif
files in the other repositories as well as long as they are so small.
I do not really understand the purpose of
Apptainer.def
. The already existing image could simply be pulled from DockerHub without needing this definition file and without sudo permissions. I would changeapptainer build $(SIF_FILE) $(APPTAINER_DEF)
toapptainer pull $(SIF_FILE) $(DOCKERHUB)
withDOCKERHUB
set todocker://deptmetagenom/fastp:0.23.4
in theMakefile
.
ChatGPT told me to do it like this, and it worked, so I saw no reason to change it. The reason to use a .def
file might have something to do with the fact that the entrypoint (%runscript
) is set in it. Maybe it would work like this if I just pulled the Docker image with apptainer, and it would simply adopt the Docker image's entrypoint, I haven't tried.
When running the container shell, fastp is not available in the
$PATH
, which could be a minor inconvenience. The entry point is set correctly, so we should remember to use fastp with this container by runningdocker run fastp
orapptainer run fastp.sif
, or startfastp
by specifying its path when usingapptainer exec
(i.e.apptainer exec fastp.sif /app/fastp
). I am describing this because although setting the entry point is the correct way, it is not always followed and fastp not being available in the$PATH
could lead to some confusion.
As I said, I will be adding a readme which will explain exactly this, i.e. that the Docker and Apptainer containers need to be run like this, furthermore adding information on mounting the host directory that contains the input and where the output will be saved.
I am just asking out of curiosity: is there a particular reason why you used Alpine as the base image? Are there any advantages to using this base aside from it being very lightweight?
No, that's the (only) reason why everybody uses alpine as the base image. The image should be as small as possible.
ChatGPT told me to do it like this, and it worked, so I saw no reason to change it. The reason to use a .def file might have something to do with the fact that the entrypoint (%runscript) is set in it. Maybe it would work like this if I just pulled the Docker image with apptainer, and it would simply adopt the Docker image's entrypoint, I haven't tried.
I have and the entrypoint is "adopted", the Docker --> Apptainer works flawlessly here. There is no need to set the entrypoint again. If using the .def
file is not absolutely necessary and the conversion works I would definitely just pull the Docker image.
I have and the entrypoint is "adopted", the Docker --> Apptainer works flawlessly here. There is no need to set the entrypoint again.
Right, then there is no point in building the Apptainer image from .def, just pulling it does the same. Ultimately it doesn't matter, we get the exact same result either way, but I'll change it to apptainer pull
as requested.
This is ChatGPT's much more detailed answer to your original question:
The key difference between pulling a Docker image with
apptainer pull docker://mydockerimage:latest
and building an Apptainer image using a minimal.def
file like the one you provided lies in flexibility, control, and customization. Here’s a breakdown of both approaches and whether building an image offers any benefits:Pulling a Docker Image with Apptainer
When you use the command
apptainer pull docker://mydockerimage:latest
, Apptainer directly converts the Docker image into an Apptainer.sif
image. The result is an Apptainer container that mirrors the Docker image, including its layers, entrypoint, environment variables, and any default commands.Pros:
- Simplicity: It’s fast and straightforward. You get the Docker image as-is, with no modifications required.
- Automatic conversion: Apptainer handles the conversion for you, so there’s no need to define anything in a
.def
file.- Minimal overhead: This method doesn’t require extra steps to configure or set up; it just grabs the Docker image and creates the Apptainer
.sif
file.Cons:
- Lack of customization: You can’t modify or customize the Docker image before converting it to Apptainer. For example, if you want to change environment variables, add files, or run setup commands, this approach doesn't allow that easily.
Building an Apptainer Image with a
.def
FileUsing a
.def
file with Apptainer gives you more control and customization over the image build process. Your.def
file specifies how the Apptainer image is constructed from the Docker base image, and it can include additional steps during the%post
and%runscript
phases.Here’s what happens:
- Bootstrap: The
From
command pulls the base Docker image (mydockerimage:latest
).- Customization: You can add extra setup or configuration commands in the
%post
section (such as installing additional software, modifying the environment, etc.).- Custom run behavior: You can specify custom commands or scripts that run when the container is executed via the
%runscript
section.Pros:
- Customization: You have complete control over how the Apptainer image is built. You can:
- Add or modify files.
- Install additional dependencies.
- Change entry points or default behavior.
- Flexibility: If you need to make changes to the Docker image, even minor adjustments like setting environment variables or adding configuration files, this method allows for that.
- Pre-build tasks: You can automate tasks like setting permissions, downloading datasets, or other pre-processing steps during the build process.
Cons:
- More complexity: It requires a bit more effort since you’re defining your image build process.
- No real benefit for identical images: If you don’t need to make any changes to the Docker image (i.e., you want the Apptainer image to run exactly like the Docker image), this method may not offer significant advantages.
Key Differences
Customization:
- Pulling a Docker image gives you no opportunity to change or customize anything.
- Using a
.def
file allows for full customization of the image, both during the build and run phases.Entrypoint and Run Behavior:
- When you pull the Docker image, Apptainer respects the Docker image’s entrypoint and run configuration by default.
- In the
.def
file, you can override or modify the entrypoint with the%runscript
section, but you don’t have to. You can leave the image’s behavior the same by simply usingexec "$@"
in the%runscript
(i.e., forwarding the same behavior as the Docker container).Pre/Post Setup:
- Pulling doesn’t allow for pre/post setup actions like installing extra packages or modifying configurations.
- The
.def
file enables you to add any setup logic during the%post
phase.Conclusion: Which to Use?
- Use
apptainer pull docker://...
if you just want to convert a Docker image to Apptainer format quickly and don’t need any modifications.- Use a
.def
file if:
- You need to customize the image.
- You want to add extra dependencies or configuration.
- You require more control over the build process.
If you don’t need to customize or modify the Docker image in any way, there’s no practical benefit to using the
.def
file over a simpleapptainer pull
. However, if you foresee needing any flexibility down the line, the.def
approach gives you the ability to modify the image as necessary.
@laczkol Commit https://github.com/DEpt-metagenom/containerized_genomics_tools/commit/a387040ff0060b105736f16c2a32b37c733dc2ad should have solved the problems that you indicated. Please check whether anything is missing or incorrect.
I can see that you added the target push
as a prerequisite for build_apptainer
in the Makefile. Since the prerequisite for push
is build
, I assume you added this to ensure that a new Docker image is built and pushed to the DockerHub repository. This makes perfect sense and ensures consistency. We could think about having a build_apptainer_only
target (like build_apptainer
was implemented before the last commit), so that we don't get an error if a host doesn't have Docker and the only goal is to use Apptainer. Since the .sif
file can be easily copied and is also included in the repo, I don't know how important this use case is, probably not very much or it doesn't make sense to include it as a separate target in the Makefile. Other than that I've no further comments, I think the structure and Makefile are fine. The README is detailed enough for me.
I'd wait for @zradai's comment before finalizing it.
I can see that you added the target
push
as a prerequisite forbuild_apptainer
in the Makefile. Since the prerequisite forpush
isbuild
, I assume you added this to ensure that a new Docker image is built and pushed to the DockerHub repository. This makes perfect sense and ensures consistency.
Yes, I agree that it was a smart idea to implement the workflow like this. I didn't think about it myself, it was @albertbokor-unideb who suggested this solution to me when we discussed my original Makefile, so I have to give him credit for it.
We could think about having a
build_apptainer_only
target (likebuild_apptainer
was implemented before the last commit), so that we don't get an error if a host doesn't have Docker and the only goal is to use Apptainer. Since the.sif
file can be easily copied and is also included in the repo, I don't know how important this use case is, probably not very much or it doesn't make sense to include it as a separate target in the Makefile.
I think this is definitely a good suggestion for tools where the .sif
file is too large to be included in the repository. I didn't think of the situation that the user might not have Docker installed when I wrote the Makefile.
As an alternative to the make build_apptainer_only
target, which achieves what we need, of course, but sounds kind of lame to me, I think it might be better to modify the build_apptainer
target so that it runs make push
normally, except when it is called like this: make build_apptainer PUSH=false
, in which case the push is skipped. This just requires the addition of a simple condition in the build_apptainer
target.
@zradai We're done with this part, it's your turn, please review fastp
and let me know whether you are happy with all further tools being structured like this or whether you want anything changed. I suppose you probably are happy with it since you basically just wanted the .sif
file, which is there (and will be moved to the specified directory next), the rest was our idea.
I have added the
fastp
tool to the repository.@laczkol @zradai Please check whether this is organised according to what you have requested apart from the fact that the
.sif
file has not been moved to the ZFS directory yet. The repository has a Dockerfile, an Apptainer .def file, a Makefile that pulls, builds and pushes the Docker image to and from the institute Docker Hub, contains abuild_apptainer
goal to build the Apptainer image obviously, and atest
goal to test both images by running the tool on an input file and comparing the output to the reference output included in the repository.Is there anything missing apart from a readme, which will be added later? If it is correct like this, the same pattern will be followed for all other tools.