Yelp / MOE

A global, black box optimization engine for real world metric optimization.
Other
1.31k stars 139 forks source link

Conda Installer recipe #417

Closed rmcgibbo closed 9 years ago

rmcgibbo commented 9 years ago

cc #416. As I said there, I'm not sure if you guys really want to maintain the installers for the whole dependency stack. It's possible that I was a little overzealous here -- there might be a couple packages that are included in the base conda repos [1] that aren't necessary here.

[1] http://repo.continuum.io/pkgs/free/linux-64/index.html

rmcgibbo commented 9 years ago

It looks like ssl_match_hostname, beautifulsoup, pyramid, weberror and zope.deprecation are in the base repos and can be removes from this PR.

rmcgibbo commented 9 years ago

Also, the sphinx stuff can probably be removed, since that's just for building the docs.

sc932 commented 9 years ago

I'm not super familiar with conda, is there a generic place where we could just submit the binaries of our dependencies so that we don't need to host them?

If we can make sure this is stable moving forward I am happy to merge this in.

Last few action items:

  1. Add a blurb on how to make this go to install docs in the main docs area (optionally including possible fallbacks if the binary host is dead)
  2. Make sure this is as minimal as possible

Thanks so much @rmcgibbo!

rmcgibbo commented 9 years ago

yes, that place is binstar.org. after building the binaries, you can upload them with conda install binstar; binstar upload path/to/binary.tar.bz2 you need to make an account there: e.g. https://binstar.org/rmcgibbo

sc932 commented 9 years ago

Awesome, if you want to minify the reqs of custom binaries I'm happy to pull this in. Thanks again.

rmcgibbo commented 9 years ago

I just tweeted at some of the conda developers, so there might be a possibility that some of the dependencies could be incorporated into the base conda repos.

sc932 commented 9 years ago

Sweet! Thanks @rmcgibbo!

rmcgibbo commented 9 years ago

One question: simplejson. You guys currently depend on simplejson, but I think (http://stackoverflow.com/questions/712791/what-are-the-differences-between-json-and-simplejson-python-modules) that it's just the same package that included in the stdlib as json. Using:

try:
    import simplejson as json
except ImportError:
    import json

would be able to eliminate one dependency, I think (or do you know if some simplejson-specific functionality is used?)

sc932 commented 9 years ago

Yeah, we can move back to json instead of simplejson that was an old habit from the internal Yelp systems.

rmcgibbo commented 9 years ago

do you want the try-except import, or just to use the stdlib's json unconditionally?

suntzu86 commented 9 years ago

That link indicates it's preferable to use simplejson b/c it's updated more frequently (esp until we switch to python 3, we won't be getting anything new in pkg json). I'm all for having a fallback but I think we should retain the dependency on simplejson.

sc932 commented 9 years ago

We tell people to use python 2.6+ so there shouldn't be an issue with json not being in the stdlib. But I guess the try/except allows us to get the latest and greatest simplejson (although we would only benefit from speed boosts since we don't do anything special).

suntzu86 commented 9 years ago

Er, I meant the json in the python 2.6/2.7 stdlib will not be updated in the future. So if we want to keep up with new developments, we should keep the requirement?

Will simplejson and json remain compatible for all time? Just seems simpler to leave it around; I don't think it adds much to the build time/complexity.

rmcgibbo commented 9 years ago

okay, np. I won't mess with it.

rmcgibbo commented 9 years ago

Okay. I think this is ready for someone else to take a spin with. I added a Vagrantfile that sets up the build environment for the binaries.

sc932 commented 9 years ago

Excellent. I'll give it a shot tomorrow and we can merge it in. Thanks again @rmcgibbo!

rmcgibbo commented 9 years ago

Yeah, I'm happy to use a different one. I just didn't want to introduce an external dependency into the conda install, security-wise.

sc932 commented 9 years ago

Hey @rmcgibbo, I'm having trouble building conda on a ubuntu 14.04 system with the pull as-is. It looks like you copy libstrc++.so explicitly from /usr/lib, but I have that nested within different version directories (gcc-4.7, gcc-4.8 etc). This may be just an ubuntu version issue, but is there a standard way for conda to find dependencies like this, regardless of where they are?

rmcgibbo commented 9 years ago

You should only need to build the package once, and the build needs to be done on a really old linux to ensure compatibility -- thus the vagrantfile using ubuntu 10.04

rmcgibbo commented 9 years ago

When you build binaries on a later version of linux, because of the way the symbols in libc are versioned, you'll only be able to run those binaries on that version of linux or later, or you get things like /lib/x86_64-linux-gnu/libc.so.6: version 'GLIBC_2.15' not found.

sc932 commented 9 years ago

Gotcha, I was reading the docs wrong...

Testing the install now. Sorry for the confusion!

suntzu86 commented 9 years ago

@rmcgibbo can the boost thing be configured? like default to using the prebuilt package but optionally build-from-scratch if the user is concerned? I would hope people aren't uploading 'virused' libraries but I guess in principle it's possible.

rmcgibbo commented 9 years ago

The user doesn't build anything from scratch -- the people who do the packaging (developers, etc) build the binaries once and then the users just install them. Conda has the concept of "channels" which are kind of like like Ubuntu PPAs -- there are the base channels published by the company that backs conda (ContinuumIO), and then there are binstar channels where you can put your stuff up, and users have to opt in for your channel. Boost isn't included in the base channels. So if hypothetically yelp were to put up a binstar channel at binstar.org/yelp, to get MOE they first add the yelp channel, and then do conda install moe. If the MOE binary declares boost as a dependency, the installer is going to need to go out and find boost, and unless the user has also added some additional channels, the only places it will look are in the base channels and the binstar.org/yelp channel -- it won't just find 'mutirri''s personal channel that contains some boost binaries.

rmcgibbo commented 9 years ago

It would be possible, for instance, to download mutirri's boost binaries and re-upload them into yelp's binstar channel. Or the install docs could say, "first add our binstar channel. then also add mutirri's. and then you can conda install moe."

But all of these binaries only need to be built once. When you come out with a new version of MOE, you probably don't need to update to the very latest version of boost (?), so you wouldn't need to build new packages for boost.

suntzu86 commented 9 years ago

D'oh, sorry about the super late follow-up on this :(

I don't think we're going to be able to set up a yelp binstar channel. Maybe we can create one just for MOE. But for now I'm merging this in so people can use it as a 'curated' build with 'long term support' plans through our own channel to come later. Sound reasonable to you?

Thanks again for setting up conda for us @rmcgibbo! I'm going to go ahead and merge this later today or tomorrow. Adding a short TODO for myself here: