BioContainers / specs

BioContainers specifications
http://biocontainers.pro
Apache License 2.0
49 stars 12 forks source link

Spec for a unified ENTRYPOINT #8

Closed bgruening closed 7 years ago

bgruening commented 9 years ago

If we have a spec for the ENTRYPOINT and the container name this would be first step to run arbitrary containers without looking up the documentation everytime :)

For example: docker run bioboxes/openms:2.0 FileMerger --help vs. docker run bioboxes/openms:2.0 OpenMS FileMerger --help

or

docker run bioboxes/searchgui:0.27 --help vs. docker run bioboxes/searchgui:0.27 searchgui --help

prvst commented 9 years ago

I was thinking about that lately; having the entrypoint could make more transparent for people to execute the software, but for containers with more than one executable it becomes impossible, right?

bgruening commented 9 years ago

Yes, it becomes hard, so we could write a universal entrypoint that has the same name as the container and passing all arguments further. We should discuss this and collect ideas. This is probably one of the most user-facing decisions.

ypriverol commented 9 years ago

@BioDocker/contributors @bgruening my point in this direction is that we need to be as simple as possible some reason can overcomplicate this issue of an standard interface, for example if a software has internal calls to data or other tools inside the containers or for example R containers that provide their own environment, we need to be focus in the way of providing the containers as mush as possible. @bgruening how Galaxy does the communication between the different components? Is the responsibility of a container the way of communicating their data of is the responsibility of the app?

bgruening commented 9 years ago

Imho our containers should not care about communications. Containers should execute a program on some input data and should create some output data. I see Docker containers more or less like a big-static binary. All the communications should be handled by specialised systems like Galaxy or Taverna - here files are passed through the different components.

sauloal commented 9 years ago

I agree with @bgruening on how communication is "secondary" (actually, "so cumbersome that we're better stay out of it for a while") but there might be "good practices" that could be implemented. For single executable programs, as leprevost said, the executable could be called automatic in the entry point but in other cases this is not necessary.

on the other hand, a list of "good practice" would be: never run as root, create user biodocker with uid 1001 (so that different containers can read/write files) always set workdir to /data always add the program to the path (no need to full path) if single executable, set CMD to --help in the base system (FROM), always lock the version (e.g.: ubuntu:12) to avoid incompatibility in the future never compile code in the shipped version (bloats the image)

and follow the best practices outlined by docker: https://docs.docker.com/articles/dockerfile_best-practices/

bgruening commented 9 years ago

@sauloal can you create an initial PR for best-practices? So we can start a discussion on this. Thanks a lot for driving this forward!

sauloal commented 9 years ago

@Leprevost @ypriverol , could you please assign this issue to me?

@bgruening , as I've told @Leprevost , I'm writing the last section of my PhD thesis to be delivered in a month so please forgive me if it takes me a while to start this assignment.

bgruening commented 9 years ago

@sauloal nice I'm trying to submit my thesis next week :)

prvst commented 9 years ago

@sauloal ok for me

bgruening commented 9 years ago

@Leprevost can you provide me with access to this repo or assign @sauloal to it?

prvst commented 9 years ago

sorry about that, i didn't realized that the repositories where restricted, now all the collaborators have access to all content here.

On Fri, Aug 21, 2015 at 3:44 AM Björn Grüning notifications@github.com wrote:

@Leprevost https://github.com/Leprevost can you provide me with access to this repo or assign @sauloal https://github.com/sauloal to it?

— Reply to this email directly or view it on GitHub https://github.com/BioDocker/biodocker/issues/8#issuecomment-133317008.

sauloal commented 9 years ago

I've been thinking. If entrypoints can only be used for single executables, for consistency sake wouldn't be better to not have entrypoint and let the user call the program? as the program will be in the path, they just need to read the manual of the program (which they should know anyhow if they're planning on running it), we will add no complexity whatsoever.

I also think that a "universal caller" would add unnecessary complexity. If I have a script that call PROGRAM, now I only have to change to call biodocker PROGRAM PROGRAM1. if I have to call the PROGRAM2 i run biodocker PROGRAM PROGRAM2. this way there are repetition but less complexity and change in existing pipelines. Also, it is consistent between programs having 1 or more executables inside.

One possiblity would be o always add an help command with the possible parameters to the container so that it is not needed to check the dockerfile. e.g.: biodocker PROGRAM help manual: http://abc.com biodocker PROGRAM PROGRAM1 biodocker PROGRAM PROGRAM2

/cc @BioDocker/contributors

bgruening commented 9 years ago

@sauloal exactly. Imho it would be nice to use a container like a binary. Usually samtools sort foo.bam and now docker run biodocker/samtools:dev sort foo.bam.

But I guess it would be also nice to have something like docker run biodocker/samtools:dev --help or docker run biodocker/samtools:dev --version available.

What do you think?

sauloal commented 9 years ago

@bgruening . for a single executable would be great. but to programs with multiple executables either we create multiple docker files or we have to create one model for single executable and one model for multiple executables. this creates complexity. every time you have to guess. I think that, for the sake of consistency, it is better to always call it twice and always have to call it that sometimes having to specify the program and sometimes not.

so samtools sort foo.bam becomes docker run biodocker/samtools:dev samtools sort foo.bam

it is repetitive but you can drop the docker run biodocker/samtools:dev behind your existing code without any more changes.

if you imagine bowtie with its several commands (bowtie, bowtie-build, bowtie-inspect), either we do: docker run biodocker/bowtie:dev -a -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT docker run biodocker/bowtie-build:dev [options]* <reference_in> <ebwt_base> docker run biodocker/bowtie-inspect:dev [options]* <ebwt_base> which are exact the same images except for the entrypoint

or docker run biodocker/bowtie:dev bowtie -a -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT docker run biodocker/bowtie:dev bowtie-build [options]* <reference_in> <ebwt_base> docker run biodocker/bowtie:dev bowtie-inspect [options]* <ebwt_base>

that said. auxiliary scripts (called help and version, for example) can always be called: docker run biodocker/bowtie:dev help docker run biodocker/bowtie:dev version

bgruening commented 9 years ago

Sorry, I missed the important samtools in my previous post. This is exactly what I had in mind. +1 for everything. Also interesting idea of these auxiliary scripts. Like it!

sauloal commented 9 years ago

fixed. added CMD as the default in the dockerfile example and best practices

@Leprevost . could you please close it?

prvst commented 9 years ago

sure, i'm not sure yet if we should before updating all dockerfiles are after

prvst commented 9 years ago

We can also use supervisor to work with multiple entrypoints:

http://blog.trifork.com/2014/03/11/using-supervisor-with-docker-to-manage-processes-supporting-image-inheritance/

http://stackoverflow.com/questions/18805073/docker-multiple-entrypoints

sauloal commented 9 years ago

for server: definitely. any other way is a hack :P

for command line programs: not so much. it does not allow for running either one command or the other (see the example of bowtie)

prvst commented 9 years ago

what about the autostart option in the configuration file? if you set to false you need to execute the program manually, right?

sauloal commented 9 years ago

i guess so. but again. only for servers isnt it? do you mean you would like to have it installed by default?

prvst commented 9 years ago

i think you are right, this is not going to help in this case

prvst commented 8 years ago

whats the update on this issue? can we have the binaries on PATH and no ENTRYPOINT?

sauloal commented 8 years ago

@bgruening

excellent examples:

https://denibertovic.com/posts/handling-permissions-with-docker-volumes/

https://stackoverflow.com/questions/23544282/what-is-the-best-way-to-manage-permissions-for-docker-shared-volumes

bgruening commented 7 years ago

I guess we can close this one now.