BioContainers / specs

BioContainers specifications
http://biocontainers.pro
Apache License 2.0
49 stars 12 forks source link

One single repository for containers #7

Closed bgruening closed 9 years ago

bgruening commented 9 years ago

If we agree to follow https://github.com/BioDocker/specifications/issues/6 we could think about having one repository for all Dockerfiles. From our experience this has many advantages for a developer, like bulk-updates, unified maintenance (travis, contribution.md, issues) and a grep-able big repository.

prvst commented 9 years ago

@bgruening This is a great idea, I thought about that in the pass but not sure way, i decided to create a single repository for each container. I think it was because the automated build, but now I know how to configure that to work on separated dockerfiles in different folders. What do you think about that @ypriverol ? I'm in favor of changing to a single repo.

ypriverol commented 9 years ago

@bgruening can we setup the galaxy repo? If yes please close the issue. Any other option from other @BioDocker/contributors

prvst commented 9 years ago

@bgruening you mean creating a repo for galaxy on Biodocker? OK for me. I'm planning to create a single repo called Proteomics or Proteomics-tools, and then move all the other folders inside. Having that set up I can re initiate the automatic build at DockerHub

bgruening commented 9 years ago

@Leprevost why not creating a bioboxes repository and move everything into this? (With everything I mean no binaries, we should keep them out of the git history, if possible.) @ypriverol I'm not sure Galaxy fits here. It's more or less a complete Galaxy Server with FTP and Scheduler, all in one, not a single application. I see this more as that Galaxy is using the BioDocker containers to resolve the dependencies. Does this makes sense?

ypriverol commented 9 years ago

Agree with you. Biodocker is about the containers rand not the way of connecting them. The galaxy idea is about have a place for the sources.

prvst commented 9 years ago

OK, I will create the Proteomics repository then, and start moving the deploy instructions

ypriverol commented 9 years ago

@bgruening I also see Galaxy will use BioDocker containers to resolve the dependencies. But for the single repository for (source of the softwares we will use) I would like to have a place in Galaxy Servers if is possible. All the tagging, download, search, etc can be done using the Docker Hub. In any case we can also continue with github

prvst commented 9 years ago

hey @BioDocker/contributors I'm moving all folders into a single repository and re configuring all the automatic build directives on DockerHub. The containers will be unavailable for a few hours, hope to finish all tonight.

prvst commented 9 years ago

done, the containers are all building, some of them are done, others will take some time to be available. We now have a central repository called Proteomics. Inside we have a folder for each software and inside a folder for each version containing a Dockerfile. We still have the binaries, I'm planning to move them with time or all at the same time if we get some place to store them.

ypriverol commented 9 years ago

@Leprevost @bgruening I'm not sure this approach is the best option. I will explain my idea:

bgruening commented 9 years ago

I'm also -1 one this change but for different reasons :)

About the sources and binaries, we should try to put the them if they are not in Github in a central repository, I was talking with @bgruening about the possibility of Galaxy, but we should need more people for that. It will allow us to control the sources and binaries if the original repo of the software disappear (like source-forge).

+1 for this. Binaries should not be part of the repository. Text only. And we should make this mandatory from the beginning. Removing them later from the git history is cumbersome.

One place for the containers \ This approach (as implemented now) is no suitable if the user is interested in one particular container. Having one repository for all the containers will make difficult to find and see individual containers and use them. This implementation also do not represent our main goal, providing the containers and the leave to the platforms (Galaxy, TPP, OpenMS) the way to connecting them.

I don't fully understand this point. You can perfectly build a Docker container only for PeptideShaker automatically via DockerHub, even if you have multiple Dockerfiles in one repository. Was this your concern?

Categories like Proteomics, Genomics... organisation will make us to fall in a difficult scenario: when a user is interested in one particular program that is more generic such as EMBOSS, it will need to look inside all the Categories. This classification should be done using tagging system.

I totally agree with this, but I vote for one repository BioDocker and putting everything under this repository for the reasons outlined above and because we should bring communities together. Proteomics, Transcriptomics, Imaging ... we need to get away with the distinctions. It's all about data and integration.

My Idea: Keep the dockers containers or as repositories and we will continue removing them to the original github repos where the developers can maintain them or have a BioDocker Repo where we put all the containers, not only from proteomics.

Can you elaborate on this more? Is this not exactly the one-repo multiple Dockerfiles idea?

prvst commented 9 years ago

OK, @ypriverol and @bgruening you have good points, I agree with some of them. I personally think that having a central repository is better to maintain. A problem I noticed in the past with having several docker repositories is that it makes harder to find stuff in the GitHub page. If we grow too much it will get to a point that we will need some sort of list to keep track of it and to show people what we have, and I think that's not practical. With a central repository, everything is listed as folders inside the central repo, you just need to open it to see all the programs there.

We have to keep in mind that the repositories are not for the "users", they are for the developers. My second point is that the repositories are not going to be updated that much, because they are going to hold just dockerfiles, right? So I don't think we will have too much movement on the group, maybe when a new software is releases or a new version is launched.

I agree about changing the name from Proteomics to Biodocker. I will do that today.

ypriverol commented 9 years ago

+1 Ok lets changed to biodocker or something similar. Please @Leprevost can you make clear this discussions in the specification and update this specification, you can close the issue. Perhaps is better to use biodockers

ypriverol commented 9 years ago

biodocker is fine

prvst commented 9 years ago

done

sauloal commented 9 years ago

One idea. subfolders could be created for "genomics", "transcriptomics", "proteomics", "others" and symbolic links created inside each of them linking to the folder in the global list. this way everybody wins except when symlinks break.