Closed hardingnj closed 8 years ago
For the moment, I've removed simupop we can think about how to address this later.
Thanks Nick, I have no immediate plans to use simupop so fine to remove. At some point in the not-too-distant future we might consider using bioconda to install binaries instead of installing everything from source via pip, but that needs some investigation, I haven't tried bioconda yet.
On Tuesday, March 1, 2016, Nick Harding notifications@github.com wrote:
For the moment, I've removed simupop we can think about how to address this later.
— Reply to this email directly or view it on GitHub https://github.com/cggh/biipy/issues/22#issuecomment-190848351.
Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com alimanfoo@gmail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721
I think I'm going to have a go with conda/bioconda
We have some desicions to make though.
Bioconda is something I wasn't aware of. For several of the things in biipy, we may want to think about writing recipes for conda/bioconda. Jerome has done this for msprime. It doesn't seem to be a lot of additional work, very similar to what you (AM) did with basemap/treemix.
I think my preference is for 1. It does require us hitching our wagon to anaconda, but we can easily control versions using their tags. Would be interested to hear thoughts though.
Do you know which steps are causing the most time in the build currently?
Whichever option we go for, I think we still want to build numpy (and possibly scipy?) from scratch against openblas, rather than install binaries. I know these steps are both very time consuming but the performance improvement from building against openblas for things like PCA is dramatic (order of magnitude).
On Mon, Apr 25, 2016 at 10:48 AM, Nick Harding notifications@github.com wrote:
I think my preference is for 1. It does require us hitching our wagon to anaconda, but we can easily control versions using their tags. Would be interested to hear thoughts though.
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/cggh/biipy/issues/22#issuecomment-214238785
Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com alimanfoo@gmail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721
Numpy takes quite a while, but scipy takes ages... like > 40 minutes from source.
The other thing we could do is have a base image where we install numpy and scipy and pull from that?
Or, most simple of all, we could build locally and push images to dockerhub instead of the docker hub/github interface
On Mon, Apr 25, 2016 at 11:28 AM, Nick Harding notifications@github.com wrote:
Numpy takes quite a while, but scipy takes ages... like > 40 minutes from source.
Ouch.
The other thing we could do is have a base image where we install numpy and scipy and pull from that?
Or, most simple of all, we could build locally and push images to dockerhub instead of the docker hub/github interface
I have a mild preference for sticking with automated builds. It's less convenient, but it's harder to screw something up.
Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com alimanfoo@gmail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721
My 2p, an automated build to build a base image, and then work from the image is a good way to go. (Does it help to give the build more resources?)
Btw I think it's also worth considering starting from a Ubuntu 16.04 base image, with Python 3.5 as the default it would simplify a number of the existing steps.
I've made a start on this. Splitting some of the overhead into a "base" image.
I don't know how to check if we are installing numpy from source with openblas. The installation takes very little time, so I suspect we are not.
Additionally, I am having issues installing ipython 4.2.0/llvmlite
I'll push my changes to a branch.
Maybe we can discuss later in the week. Hit a bit of a wall here :/
Sure, skype tomorrow?
Using latest pip installs a binary version of numpy, i.e., bypasses compilation. This has changed since the previous time we built a biipy image. Basically we just need to force pip to compile numpy, if openblas is already installed then numpy will detect it during the build process and build against it.
On Wednesday, April 27, 2016, Nick Harding notifications@github.com wrote:
Maybe we can discuss later in the week. Hit a bit of a wall here :/
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/cggh/biipy/issues/22#issuecomment-215053993
Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com alimanfoo@gmail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721
Thanks all. Fixed in newest version, ended up splitting the dockerfile
This is due to addition of simupop, which takes ages.
Should we prune the Dockerfile or move to a push model?