BioContainers / specs

BioContainers specifications
http://biocontainers.pro
Apache License 2.0
49 stars 12 forks source link

Don't ship binaries and tarballs in repositories #6

Closed bgruening closed 9 years ago

bgruening commented 9 years ago

I would like to put the binaries/tarballs etc. into a public location, separated from the source (Dockerfile), or use the original source (if they are trustable). This will improve usability for us developers and shrink dramatically the download time. Moreover, a few packages like PeptideShaker are to big to store them in github.

Maybe the EBI can sponsor some storage or we can try to get some Google-Drive running? The Galaxy project is running this service: http://depot.galaxyproject.org/ most of our target tools should be there or we can put them into the depot if you agree to work more closely with the Galaxy community?

ypriverol commented 9 years ago

@bgruening for now if we can allocate the space using http://depot.galaxyproject.org/ it would be fantastic. From the EBI, I will try to get some support but it would be mainly after the project is mature enough. Also, it would be important to keep the project as a community effort and I guess the Galaxy host for now is the best option. Opinions @Leprevost @BioDocker/contributors +1

prvst commented 9 years ago

I totally agree with removing source and binaries from repositories. I did that way first because it was more convenient for starting, also because some of the tools still depend on Source Forge and their service has been intermittent for the pass weeks.

ypriverol commented 9 years ago

@bgruening @Leprevost if we agree with issue #7 we can remove this from here.

prvst commented 9 years ago

agreed

On Fri, Aug 7, 2015 at 5:53 PM Yasset Perez-Riverol < notifications@github.com> wrote:

@bgruening https://github.com/bgruening @Leprevost https://github.com/Leprevost if we agree with issue #7 https://github.com/BioDocker/specifications/issues/7 we can remove this from here.

— Reply to this email directly or view it on GitHub https://github.com/BioDocker/specifications/issues/6#issuecomment-128841190 .

ypriverol commented 9 years ago

We will no support for now the binaries or tarballs inside the containers and in the future we will try to provide this feature through other resources such as Galaxy FTP, or other server. For now close the issue and agree we will only support github sources?

sauloal commented 9 years ago

"only github sources". would that limit self hosted (by the creators) packages?

ypriverol commented 9 years ago

@sauloal the idea is that as a project we should provide a way of cheking the quality of the containers, if the source server is not available then we will lost the reference to the package and them some of the tool will fail, for the end user this process needs to be blind, I dont want to download something that already fail. As starting point and for the health of @Biodocker I think would be good to have healthy containers until we found a way of checking this automatically and remove them. What do you think?

sauloal commented 9 years ago

@ypriverol , I agree with the concept but the implementation is a problem. For example, a assembler I've created a docker container for can only be downloaded from the university's website after a request to the author. This is actually disturbingly common because the author use the number of request as argument for further funding from the university/agencies.

That said, source forge would be considered a good and stable repository until recently. As far as we know, github might be 6 months away of becoming source forge. then what? github as a preferred repository?. sure. exclusive? ....

ypriverol commented 9 years ago

@sauloal @BioDocker/contributors As I say before, no perfect solution at the moment. In any case those examples when you need authentication to download the source will not work with any approach becuase you always need to provide the url and the users need to know howto subscribe etc. We need to cover a set of containers and images that make easy the work for the end-user. We @bgruening were talking about support from public servers and other community projects to host the sources such as Galaxy. We can start with those containers we can support with github and then, move on.

@sauloal if you have already a use case where the source of the in't in github, then we should think about how to support that, any public server solution?

sauloal commented 9 years ago

@ypriverol , Masurca assembler comes to mind, a genome assembler. ( http://www.genome.umd.edu/masurca_compile.html ). This is the case where you can only download the code from their website after emailing the developer.

Another example is Quorum ( http://www.genome.umd.edu/quorum.html ), from the same group, a error correction tool for NGS. although you can download it directly (no need to ask for permission), the code is available at the university's FTP site.

My point is, I don't think that it should be set in stone whether we will exclusively accept github hosted software.

bgruening commented 9 years ago

@sauloal I don't think we can include Masurca or Quorum here or in any other package management system, unless we get the permission to redistribute there tools without restrictions. People will learn that this is bad overtime and this will change, I'm pretty sure.

I think what @ypriverol is pointing out we should advertise stable download repositories. This can and will change over time but for reproducibility we need an sustainable archive. I'm on your site to make it not github only, but the should also keep an eye on the URLs we include and move tarball to more stable places. See my poorly attempt here: https://github.com/bgruening/download_store

In the end we need a replicable, distributable object store, maybe based on torrents so that every university can contribute to make packages sustainable and make us independent from commercial hosting services.

ypriverol commented 9 years ago

@sauloal @bgruening My point is do this, by steps because then is difficult to control and grows the project. We can start the project by supporting github or other stable providers and and keep the issue open until we found a long-term solution. We are already evaluating options and this must be the aim now: 1- The source tarball should be provided by formal/stable/long-term support servers. 2- This is a community-driven effort, open-source, free, etc... if the providers needs license, subscriptions, we need to think in the best way to support this use cases, but my guess is that will be hard. 3- We should look and propose solutions for this long-term server support, we talk about Galaxy, EBI, any other?

If you are agree we can leave open this issue and look for options.

sauloal commented 9 years ago

@bgruening , I really liked your idea of download_store . regarding quorum, you don't need to request it to be able to download. it is open and available on the university's FTP

@ypriverol , I completely agree with the spirit of your idea. I just think that there are too many tools that still are not in stable servers. this could really impact our reach. Could we somehow estimate how many programs from galaxy are privately hosted?

That said, I second your proposition of starting only with github, as long as we agree that it is a best practice and not a rule set in stone which we won't change.

Regarding your point 2, we should create a section describing how it is strictly forbidden to upload unlicensed programs.

Regarding your point 3, we can also try google and amazon, both of which have biology initiatives which could may be host our data.

prvst commented 9 years ago

what about using github itself to host binaries? We could have an separate account or repository only with binaries. Inside the container we could just push the files.

On Thu, Aug 20, 2015 at 10:47 AM Saulo notifications@github.com wrote:

@bgruening https://github.com/bgruening , I really liked your idea of download_store . regarding quorum, you don't need to request it to be able to download. it is open and available on the university's FTP

@ypriverol https://github.com/ypriverol , I completely agree with the spirit of your idea. I just think that there are too many tools that still are not in stable servers. this could really impact our reach. Could we somehow estimate how many programs from galaxy are privately hosted?

That said, I second your proposition of starting only with github, as long as we agree that it is a best practice and not a rule set in stone which we won't change.

Regarding your point 2, we should create a section describing how it is strictly forbidden to upload unlicensed programs.

Regarding your point 3, we can also try google and amazon, both of which have biology initiatives which could may be host our data.

— Reply to this email directly or view it on GitHub https://github.com/BioDocker/biodocker/issues/6#issuecomment-133037668.

sauloal commented 9 years ago

@Leprevost , that's exacly what @bgruening 's repository does

prvst commented 9 years ago

OK, lets move on with this point. I will create a repository called binaries inside the biodocker group and I will start moving the binaries. If everyone is in favor of using that solution for now, I will close this thread.

sauloal commented 9 years ago

+1

bgruening commented 9 years ago

:+1: