NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.08k stars 14.06k forks source link

Packaging Machine Learning Weights #46922

Open CMCDragonkai opened 6 years ago

CMCDragonkai commented 6 years ago

As mentioned before: https://github.com/NixOS/nix/issues/859#issuecomment-420491677

I'd like to get started packaging ML weights into Nixpkgs. With an increasing amount of ML applications, these weights represent "source code" for the application. Examples include things like libpostal and MaskRCNN.

The first candidate is all the weights for Keras.

Here are all the packaged weights that Keras uses: https://github.com/fchollet/deep-learning-models/releases

Right now Keras (https://keras.io/applications/) code automatically tries to download those weights when you initialise those models.

This makes it difficult to do testing when when the checkPhase is running in an isolated container to prevent impurities.

If these weights were packaged, I can switch all the weight parameters to None to prevent Keras from downloading them, and instead load explicitly load them from Nix packaged weights.

These weights are also used for transfer learning, where you bootstrap the NN using these weights.

I will submit a PR with this structure:

# right now pkgs/applications/science/machine-learning already exists
# so we create a weights directory there and a subdirectory of keras (as those weights are in hdfs format and used by the keras framework)
mkdir -p pkgs/applications/science/machine-learning/weights/keras
# in the future other framework weight directories can be supported like tensorflow or torch
# create this file
pkgs/applications/science/machine-learning/weights/keras/default.nix

Then the weights will be accessible at the top level with this attribute path: machineLearningWeights.keras.resnet50. There are 2 variations topless or with tops. This can be represented as parameter to the function producing the derivation. And we can supply alternate aliases that override the parameter using callPackage pattern. Like resnet50-notop. Also resnet50-tf-notop... etc.

There will be no setup hooks or PATHs or anything. The user of these derivations will get the full path by doing: ${drv}/weights.h5. This means all the weights will be saved in the /nix/store with the name of weights.h5.

In the future datasets could also be added like the MNIST dataset which is fairly small and used quite a lot.

I'm looking for some feedback, as I think Nix would be great for machine learning reproducibility. And this bridge the gap for synchronising versions between the code and data (data that itself is code too).


Oh and there's also pkgs/data directory. Not sure if that will be better place to place these things. I would think things like MNIST datasets or Imagenet datasets would make sense to put there.

CMCDragonkai commented 5 years ago

I've done the initial packaging of mnist data into pkgs/data/machine-learning/mnist/default.nix. Next I can add in Imagenet or CoCo.

CMCDragonkai commented 5 years ago

https://github.com/keras-team/keras-applications/issues/52

stale[bot] commented 4 years ago

Thank you for your contributions.

This has been automatically marked as stale because it has had no activity for 180 days.

If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity.

Here are suggestions that might help resolve this more quickly:

  1. Search for maintainers and people that previously touched the related code and @ mention them in a comment.
  2. Ask on the NixOS Discourse.
  3. Ask on the #nixos channel on irc.freenode.net.