Closed bgruening closed 7 years ago
I was thinking about that lately; having the entrypoint could make more transparent for people to execute the software, but for containers with more than one executable it becomes impossible, right?
Yes, it becomes hard, so we could write a universal entrypoint that has the same name as the container and passing all arguments further. We should discuss this and collect ideas. This is probably one of the most user-facing decisions.
@BioDocker/contributors @bgruening my point in this direction is that we need to be as simple as possible some reason can overcomplicate this issue of an standard interface, for example if a software has internal calls to data or other tools inside the containers or for example R containers that provide their own environment, we need to be focus in the way of providing the containers as mush as possible. @bgruening how Galaxy does the communication between the different components? Is the responsibility of a container the way of communicating their data of is the responsibility of the app?
Imho our containers should not care about communications. Containers should execute a program on some input data and should create some output data. I see Docker containers more or less like a big-static binary. All the communications should be handled by specialised systems like Galaxy or Taverna - here files are passed through the different components.
I agree with @bgruening on how communication is "secondary" (actually, "so cumbersome that we're better stay out of it for a while") but there might be "good practices" that could be implemented. For single executable programs, as leprevost said, the executable could be called automatic in the entry point but in other cases this is not necessary.
on the other hand, a list of "good practice" would be: never run as root, create user biodocker with uid 1001 (so that different containers can read/write files) always set workdir to /data always add the program to the path (no need to full path) if single executable, set CMD to --help in the base system (FROM), always lock the version (e.g.: ubuntu:12) to avoid incompatibility in the future never compile code in the shipped version (bloats the image)
and follow the best practices outlined by docker: https://docs.docker.com/articles/dockerfile_best-practices/
@sauloal can you create an initial PR for best-practices? So we can start a discussion on this. Thanks a lot for driving this forward!
@Leprevost @ypriverol , could you please assign this issue to me?
@bgruening , as I've told @Leprevost , I'm writing the last section of my PhD thesis to be delivered in a month so please forgive me if it takes me a while to start this assignment.
@sauloal nice I'm trying to submit my thesis next week :)
@sauloal ok for me
@Leprevost can you provide me with access to this repo or assign @sauloal to it?
sorry about that, i didn't realized that the repositories where restricted, now all the collaborators have access to all content here.
On Fri, Aug 21, 2015 at 3:44 AM Björn Grüning notifications@github.com wrote:
@Leprevost https://github.com/Leprevost can you provide me with access to this repo or assign @sauloal https://github.com/sauloal to it?
— Reply to this email directly or view it on GitHub https://github.com/BioDocker/biodocker/issues/8#issuecomment-133317008.
I've been thinking. If entrypoints can only be used for single executables, for consistency sake wouldn't be better to not have entrypoint and let the user call the program? as the program will be in the path, they just need to read the manual of the program (which they should know anyhow if they're planning on running it), we will add no complexity whatsoever.
I also think that a "universal caller" would add unnecessary complexity. If I have a script that call PROGRAM, now I only have to change to call biodocker PROGRAM PROGRAM1. if I have to call the PROGRAM2 i run biodocker PROGRAM PROGRAM2. this way there are repetition but less complexity and change in existing pipelines. Also, it is consistent between programs having 1 or more executables inside.
One possiblity would be o always add an help command with the possible parameters to the container so that it is not needed to check the dockerfile. e.g.: biodocker PROGRAM help manual: http://abc.com biodocker PROGRAM PROGRAM1 biodocker PROGRAM PROGRAM2
/cc @BioDocker/contributors
@sauloal exactly. Imho it would be nice to use a container like a binary.
Usually samtools sort foo.bam
and now docker run biodocker/samtools:dev sort foo.bam
.
But I guess it would be also nice to have something like docker run biodocker/samtools:dev --help
or docker run biodocker/samtools:dev --version
available.
What do you think?
@bgruening . for a single executable would be great. but to programs with multiple executables either we create multiple docker files or we have to create one model for single executable and one model for multiple executables. this creates complexity. every time you have to guess. I think that, for the sake of consistency, it is better to always call it twice and always have to call it that sometimes having to specify the program and sometimes not.
so samtools sort foo.bam
becomes docker run biodocker/samtools:dev samtools sort foo.bam
it is repetitive but you can drop the docker run biodocker/samtools:dev
behind your existing code without any more changes.
if you imagine bowtie with its several commands (bowtie, bowtie-build, bowtie-inspect), either we do:
docker run biodocker/bowtie:dev -a -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT
docker run biodocker/bowtie-build:dev [options]* <reference_in> <ebwt_base>
docker run biodocker/bowtie-inspect:dev [options]* <ebwt_base>
which are exact the same images except for the entrypoint
or
docker run biodocker/bowtie:dev bowtie -a -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT
docker run biodocker/bowtie:dev bowtie-build [options]* <reference_in> <ebwt_base>
docker run biodocker/bowtie:dev bowtie-inspect [options]* <ebwt_base>
that said. auxiliary scripts (called help and version, for example) can always be called:
docker run biodocker/bowtie:dev help
docker run biodocker/bowtie:dev version
Sorry, I missed the important samtools
in my previous post. This is exactly what I had in mind. +1 for everything.
Also interesting idea of these auxiliary scripts
. Like it!
fixed. added CMD as the default in the dockerfile example and best practices
@Leprevost . could you please close it?
sure, i'm not sure yet if we should before updating all dockerfiles are after
We can also use supervisor to work with multiple entrypoints:
http://stackoverflow.com/questions/18805073/docker-multiple-entrypoints
for server: definitely. any other way is a hack :P
for command line programs: not so much. it does not allow for running either one command or the other (see the example of bowtie)
what about the autostart option in the configuration file? if you set to false you need to execute the program manually, right?
i guess so. but again. only for servers isnt it? do you mean you would like to have it installed by default?
i think you are right, this is not going to help in this case
whats the update on this issue? can we have the binaries on PATH and no ENTRYPOINT?
I guess we can close this one now.
If we have a spec for the ENTRYPOINT and the container name this would be first step to run arbitrary containers without looking up the documentation everytime :)
For example:
docker run bioboxes/openms:2.0 FileMerger --help
vs.docker run bioboxes/openms:2.0 OpenMS FileMerger --help
or
docker run bioboxes/searchgui:0.27 --help
vs.docker run bioboxes/searchgui:0.27 searchgui --help