flatironinstitute / mountainlab-js

MountainLab is data processing, sharing and visualization software for scientists. It is built around MountainSort, spike sorting software, but is designed to be more generally applicable.
Other
43 stars 30 forks source link

Proposal: standalone installer #52

Open tjd2002 opened 6 years ago

tjd2002 commented 6 years ago

A standalone, single-step installer that does not require installing, configuring, or using Conda.

Conda constructor is the tool used to create custom Conda distributions (like miniconda, Anaconda, etc). It creates single-file .sh, .pkg, or .exe installers that include a pre-selected set of conda packages.

They are really easy to build with a single config file (see, e.g. https://github.com/flatironinstitute/mountainlab-conda/tree/master/constructor.ms4 ), but they don't currently function as a 'click-and-run' software installer. In particular, if you put the /bin directory on the path, then you get all the binaries from all the dependencies in the conda env, and clobber many of the user's other software (python, qmake, npm,...)... which is not what is expected of an installer.

I have proposed an enhancement to constructor at this issue which would provide specified 'entry points' for specified apps. I'm not sure if it will get any uptake from the conda devs, but there's probably a way to implement it ourselves. Using either conda-run or exec-wrappers, one can run an executable in a conda env without having to activate the env in your shell.

Even though we would hide the 'conda'-ness of this install method, it would still be a full-fledged conda environment under the hood, which means the install could be updated (with, e.g. conda update -p /path/to/mountainsort --all), or extended (conda install -p /path/to/mountainsort -c flatiron fancy_new_plugin). We could even provide this functionality with a wrapper function like ml-update or ml-get-package <packagename>)

I could also see this being a nice way to distribute things like lari/kbucket (that need to run persistently), e.g., since it manages dependencies, integrates with the way we distribute other components, and doesn't require the use of any package manager by the end user.

One downside of this is that you ship EVERYTHING in one big bundle. For Qt or Electron apps this adds up to >100s MB fairly quickly. For a first-time install this is no penalty, since you'd need to download those dependencies regardless; but it could add up and be unwieldy if there are multiple versions kicking around. Installing via conda directly gets around this, since dependencies are shared across envs; maybe there's a way to get a 'constructor'-built installer to take advantage of that.

magland commented 6 years ago

Interesting idea. Sounds like it could be more generally applicable too.

tjd2002 commented 6 years ago

Yeah--I like the idea of something that behaves like an installer, but has a full package manager/virtualenv on the backend. I'm sure there are equivalent/similar things out there using containers (that behave like apps).

tjd2002 commented 6 years ago

I just uploaded a couple single-file installers to https://github.com/flatironinstitute/mountainlab-conda/releases . These suffer from the problems mentioned above (they are huge, and they clobber all sorts of stuff on your path, just like activating a conda env does), but it is pretty cool to have everything installed in one go.

The exec-wrappers command that I mentioned above lives at: https://github.com/gqmelo/exec-wrappers (NB it can also be used with virtualenvs).

I think it might be possible to set up the 'entry points' as I proposed with a post-run script on the constructor that creates the exec-wrappers. (I.e. no need to wait for any changes to be made to constructor).

tjd2002 commented 6 years ago

Just tested using the installer to install to the /envs/ folder of an existing conda install. Interestingly, the new install does show up as a new conda env (i.e. you can switch to it with conda activate <installername>) but it doesn't seem to take advantage of conda's deduplication functionality (done using hard links).

[Tested this by running the installer twice, to 2 differently-named folders. Then ls -li in the bin directory. The first column is the file's inode (the pointer to the file on disk) and the third is the number of links that exist to that file in the current filesystem. find . -inum <inode> lets you see which files on disk share a particular inode. Conda-Installed envs share the same binaries across multiple envs. But for constructor-installed binaries, the number of links for regular files is always 1.]

tjd2002 commented 6 years ago

For reference, a project that uses Conda under the hood as part of a more sophisticated system-wide install is The Littlest JupyterHub (tljh), out of Berkeley(?) which aims to be something like a JupyterHub in a box for classroom use (See blog post here). During their install, they use Conda to set up a system env containing Jupyterhub+notebook+lab+..., and also to set up a shared, read-only user conda env, which then becomes the base environment for logged-in users. Sysadmins can add packages to the user env, create template notebooks for demos, etc.

For a while they were using Constructor, to do the conda installs, but encountered bugs and limitations (http://words.yuvi.in/post/conda-constructor-thoughts/), so switched to just scripting the install using the miniconda installer + conda install commands.

Among other things, their installer sets up systemd launchers for jupyterhub, with per-user resource limits, etc, etc. Could be a useful reference for how to do more complex installs of MountainLab down the road.