Package mountainlab (+plugins) using conda

tjd2002 commented 6 years ago

Once we figure out how to run mountainlab in an isolated env (#14), consider whether we can package it using conda.

Goal is to be able to run: conda install -c flatironinstitute mountainlab-js mountainlab-ephys-plugins ...or similar, and have all dependencies pulled in and configured, binaries on path, etc.

Nodejs and mongo are both already provided as conda packages, and conda claims to support distribution of js apps, so I think this might be doable without heroics.

tjd2002 commented 6 years ago

@magland please assign this issue to me

tjd2002 commented 6 years ago

I think this should default to being totally self-contained, using conda to get nodejs and mongodb within the environment, and setting up a mongodb within the env as well. Then if users want to instead use a system-wide mongo instance that can be a configuration option.

tjd2002 commented 6 years ago

Made some initial attemps here to package up qt-mountainview (to start): https://github.com/tjd2002/qt-mountainview/tree/conda-packaging

The built package is at: https://anaconda.org/franklab/qt-mountainview and can be installed (from anywhere on the internet) with: conda install -c franklab qt-mountainview

Currently it is only built for linux, just pulls in conda's qt, doesn't depend on mountainlab-js, and hasn't been tested on other machines.

For now, this conda package installs the mv.mp mountainlab-js package to $CONDA_PREFIX/lib/mountainlab-js/packages

When we conda-package up mountainlab-js proper, we'll need to configure it to look in that location, and not use the ~/.mountainlab/ directory for anything, since this would break the isolation which is the whole reason for this approach. These changes are being tracked at #14

magland commented 6 years ago

Cool! Will check it out later in the week.

tjd2002 commented 6 years ago

The trickiest part of this is going to be dealing with npm (which is itself a package manager) within conda.

It turns out jupyterlab relies on npm (or actually facebook's drop-in replacement for npm, called yarn) for installation of plugins, and the jupyterlab devs have been working out the implications of this for distributing their software using any mechanism other than npm itself (especially conda) for the last year or so). https://github.com/jupyterlab/jupyterlab/issues/2712 https://github.com/jupyterlab/jupyterlab/issues/2065

It looks pretty hairy! I don't understand all the nooks and crannies, however most of the complication they are hitting seems to stem from:

needing to support offline installs, where they can't just call npm install. Instead, they propose using local yarn mirrors provided by each package, then merging these mirrors somehow when someone installs a new plugin.
wanting to take advantage of npm's ability to deduplicate dependencies, and resolve dependency conflicts (since all their extensions will run in the same browser process?)
wanting to avoid 'post-link' scripts (i.e. packages running arbitrary commands after being installed to set themselves up), since this can cause unanticipated and difficult-to-debug problems (breaks the model of the package manager).

I think that we can avoid almost all of this hassle in mountainlab-js, because we don't really have an 'extensible' application with true plugins (requiring compatible versions of shared libraries). Instead, ML-Js and each ML package is a completely separate beast, with ml-run-process and friends calling executable processors. Below are a few options to consider. I'm going to start with 1):

At (conda package) build time, we run npm install and then bundle up all the dependencies (tar up the whole directory, including node_modules?) and ship them out as-is. On install, put a link in a common mountainlab/packages directory, and/or put any provided binaries in the conda env's bin directory (so they appear on the path). In this case the end user doesn't need to use or know about npm. I think the biggest loss in this case would be deduplication (e.g. the ephys-viz npm package produced by npm pack is ~27k, but the full install is 261MB, and another package may use a lot of the same dependencies), but that may be a reasonable price to pay for radical simplification.). To the end-user this looks like: conda install -c flatironinytitute mountainlab-js ephyz-viz qt-mountainview
If this proved impractical for some reason, and we want to run a true npm install, I think we can afford to be less 'pure' than jupyterlab, and just not support offline installs. This would mean requiring npm at install time for ML-js packages that use node, and then running npm install --global . as part of their conda package install. Binaries would likewise be automatically available.
A hybrid conda/npm solution, where we rely on conda to provide the isolation of a virtual environment and install the dependencies (nodejs, mongo); then use npm for all mountainlab-js software and packages. (This is similar to the common case where folks use pip to install packages in a conda env, when the package is not in conda's repos.) If we went this route, we could provide a mountainlab-js-setup 'metapackage' that pulled in the dependencies and set up various ML configuration environment variables This install procedure would look like: conda install -c flatironinstitute mountainlab-js-setup npm install --global mountainlab-js ephys-viz # assuming these have been published to the npm registry conda install qt-mountainview As can be seen in this snippet, a problem arises if we have mountainlab-js packages that aren't packaged using npm (like qt-mountainview). The conda package manager doesn't have a way of knowing which npm packages are installed, so the qt-mountainview package wouldn't be able to express its dependency on mountainlab-js--it would just have to hope that it was already there. Since a major point of this exercise (for me) is to provide a 'batteries included' way of working with mountainlab, and since not all mountainlab packages will be written in javascript, I am disinclined to go with 3).

magland commented 6 years ago

I guess option #1 is simplest, and the only downside is the total size of the package?

But I like option #2. But doesn't nodejs come bundled with npm for later versions?

tjd2002 commented 6 years ago

Good point, npm will be present for all users, but it adds complexity if we use it as part of the conda-install process (consider: managing uninstallation, npm package updates resulting in different version of js libraries for the same version of our conda package...). There are workaround for these (uninstall hooks, package-lock.json...) but overall it means we have to manage and integrate 2 different package managers instead of just one.

tjd2002 commented 6 years ago

I did a lot of work towards this in #27. Regarding package size, once I pruned all the devDependencies, the total size of the (zipped) package is now down around 5MB! (This could change if it turns out we do need to ship webpack, or if I otherwise screwed up the packaging).

I also did some research and discovered that option 1, is the preferred way to do this (I think for reasons that I laid out above). In particular, conda-forge packages up lots of npm packages this way. You can see how they do it at this GitHub code search link. Note that the npm install -g step always happens in the build step of the conda recipe, which means that it happens when the conda package is being prepared by the developer, and not when it is installed by the user.

Some of those recipes use npm pack to create a tarball, then npm install to install that tar-ball. This would be nice because npm pack simulates the regular npm publishing step (respects .npmignore for instance). But there seems to be a npm bug when installing from a tarball in this way in a package that contains bundled local dependencies. So I think I will just do it the simpler way: clean clone of git repo; npm install -g.

tjd2002 commented 6 years ago

@magland has simplified the npm packaging even further, which makes conda packaging even more straightforward. (In particular it allowed me to circumvent the npm bug I ran into, so I can use the neat trick of using npm pack+install to emulate a true npm install through the npmjs registry).

The mountainlab-js conda package is down to about 2MB, so the file-size worries above turn out to be misplaced for now. We'll see what happens when I try to package up ephys-viz using this strategy.

Remaining issues before closing:

[x] Set up mountainlab environment variables on conda activate? c.f. #14)[Done in PR #31]
[x] Package up mountainlab packages
- [See long list in spreadsheet linked in next comment]
[x] Create meta-packages
[ ] Write tests of pipelines #28

tjd2002 commented 6 years ago

Making good progress on all of these, plus a few more. I've created a new repo tjd2002/mountainlab-conda to contain all the recipes in one place (will help with creating coherent 'releases' of multiple packages at once); once that's ready I'll remove the conda recipes that are already in some of the repos.

There's also a spreadsheet at https://docs.google.com/spreadsheets/d/1hFiNEQWope6t_IN-sSGIZa1oQ-_jXVCXwSN3wlYSHmg/edit#gid=0 with information on all the different software components that currently go into mountainlab/mountainsort.

Notably, we are planning to package and ship old versions of vis (qt-mountainview, qt-mountaincompare) and processing packages (ms3 for cluster metrics like isolation distance, pyms for automatic curation) with mountainlab-js until the replacements are ready.

tjd2002 commented 6 years ago

OK, we now have a working conda install pathway! Missing is documentation, but that's already covered in #37

There is also now a 'flatiron' channel on Anaconda.org: https://anaconda.org/flatiron which contains the latest conda packages, including a metapackage called "mountainsort" that installs mountainlab-js, all the needed processor plugins for sorting, and various dependencies.

flatironinstitute / mountainlab-js

Package mountainlab (+plugins) using conda #15