cggh / biipy

Docker image for bioinformatics analysis.
MIT License
6 stars 2 forks source link

docker hub times out build v1.7.0 #22

Closed hardingnj closed 8 years ago

hardingnj commented 8 years ago

This is due to addition of simupop, which takes ages.

Should we prune the Dockerfile or move to a push model?

hardingnj commented 8 years ago

For the moment, I've removed simupop we can think about how to address this later.

alimanfoo commented 8 years ago

Thanks Nick, I have no immediate plans to use simupop so fine to remove. At some point in the not-too-distant future we might consider using bioconda to install binaries instead of installing everything from source via pip, but that needs some investigation, I haven't tried bioconda yet.

On Tuesday, March 1, 2016, Nick Harding notifications@github.com wrote:

For the moment, I've removed simupop we can think about how to address this later.

— Reply to this email directly or view it on GitHub https://github.com/cggh/biipy/issues/22#issuecomment-190848351.

Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com alimanfoo@gmail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721

hardingnj commented 8 years ago

I think I'm going to have a go with conda/bioconda

We have some desicions to make though.

  1. base our docker image on contimuum's anaconda, which is on debian. This means that we start with a lot of the heavy lifting done, then we can use conda to install the bioconda channel.
  2. we can use the bioconda docker image for their installation environment, which is on Centos5.
  3. start with ubuntu and install miniconda (stripped down anaconda) and do everything from scratch. I think this this option we may run into timeout issues again.

Bioconda is something I wasn't aware of. For several of the things in biipy, we may want to think about writing recipes for conda/bioconda. Jerome has done this for msprime. It doesn't seem to be a lot of additional work, very similar to what you (AM) did with basemap/treemix.

hardingnj commented 8 years ago

I think my preference is for 1. It does require us hitching our wagon to anaconda, but we can easily control versions using their tags. Would be interested to hear thoughts though.

alimanfoo commented 8 years ago

Do you know which steps are causing the most time in the build currently?

Whichever option we go for, I think we still want to build numpy (and possibly scipy?) from scratch against openblas, rather than install binaries. I know these steps are both very time consuming but the performance improvement from building against openblas for things like PCA is dramatic (order of magnitude).

On Mon, Apr 25, 2016 at 10:48 AM, Nick Harding notifications@github.com wrote:

I think my preference is for 1. It does require us hitching our wagon to anaconda, but we can easily control versions using their tags. Would be interested to hear thoughts though.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/cggh/biipy/issues/22#issuecomment-214238785

Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com alimanfoo@gmail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721

hardingnj commented 8 years ago

Numpy takes quite a while, but scipy takes ages... like > 40 minutes from source.

The other thing we could do is have a base image where we install numpy and scipy and pull from that?

Or, most simple of all, we could build locally and push images to dockerhub instead of the docker hub/github interface

alimanfoo commented 8 years ago

On Mon, Apr 25, 2016 at 11:28 AM, Nick Harding notifications@github.com wrote:

Numpy takes quite a while, but scipy takes ages... like > 40 minutes from source.

Ouch.

The other thing we could do is have a base image where we install numpy and scipy and pull from that?

Or, most simple of all, we could build locally and push images to dockerhub instead of the docker hub/github interface

I have a mild preference for sticking with automated builds. It's less convenient, but it's harder to screw something up.

Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com alimanfoo@gmail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721

wrighting commented 8 years ago

My 2p, an automated build to build a base image, and then work from the image is a good way to go. (Does it help to give the build more resources?)

alimanfoo commented 8 years ago

Btw I think it's also worth considering starting from a Ubuntu 16.04 base image, with Python 3.5 as the default it would simplify a number of the existing steps.

hardingnj commented 8 years ago

I've made a start on this. Splitting some of the overhead into a "base" image.

I don't know how to check if we are installing numpy from source with openblas. The installation takes very little time, so I suspect we are not.

Additionally, I am having issues installing ipython 4.2.0/llvmlite

I'll push my changes to a branch.

hardingnj commented 8 years ago

Maybe we can discuss later in the week. Hit a bit of a wall here :/

alimanfoo commented 8 years ago

Sure, skype tomorrow?

Using latest pip installs a binary version of numpy, i.e., bypasses compilation. This has changed since the previous time we built a biipy image. Basically we just need to force pip to compile numpy, if openblas is already installed then numpy will detect it during the build process and build against it.

On Wednesday, April 27, 2016, Nick Harding notifications@github.com wrote:

Maybe we can discuss later in the week. Hit a bit of a wall here :/

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/cggh/biipy/issues/22#issuecomment-215053993

Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com alimanfoo@gmail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721

hardingnj commented 8 years ago

Thanks all. Fixed in newest version, ended up splitting the dockerfile