Open CMCDragonkai opened 6 years ago
I've done the initial packaging of mnist data into pkgs/data/machine-learning/mnist/default.nix
. Next I can add in Imagenet or CoCo.
Thank you for your contributions.
This has been automatically marked as stale because it has had no activity for 180 days.
If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity.
Here are suggestions that might help resolve this more quickly:
As mentioned before: https://github.com/NixOS/nix/issues/859#issuecomment-420491677
I'd like to get started packaging ML weights into Nixpkgs. With an increasing amount of ML applications, these weights represent "source code" for the application. Examples include things like libpostal and MaskRCNN.
The first candidate is all the weights for Keras.
Here are all the packaged weights that Keras uses: https://github.com/fchollet/deep-learning-models/releases
Right now Keras (https://keras.io/applications/) code automatically tries to download those weights when you initialise those models.
This makes it difficult to do testing when when the checkPhase is running in an isolated container to prevent impurities.
If these weights were packaged, I can switch all the weight parameters to None to prevent Keras from downloading them, and instead load explicitly load them from Nix packaged weights.
These weights are also used for transfer learning, where you bootstrap the NN using these weights.
I will submit a PR with this structure:
Then the weights will be accessible at the top level with this attribute path:
machineLearningWeights.keras.resnet50
. There are 2 variations topless or with tops. This can be represented as parameter to the function producing the derivation. And we can supply alternate aliases that override the parameter usingcallPackage
pattern. Likeresnet50-notop
. Alsoresnet50-tf-notop
... etc.There will be no setup hooks or PATHs or anything. The user of these derivations will get the full path by doing:
${drv}/weights.h5
. This means all the weights will be saved in the/nix/store
with the name ofweights.h5
.In the future datasets could also be added like the MNIST dataset which is fairly small and used quite a lot.
I'm looking for some feedback, as I think Nix would be great for machine learning reproducibility. And this bridge the gap for synchronising versions between the code and data (data that itself is code too).
Oh and there's also
pkgs/data
directory. Not sure if that will be better place to place these things. I would think things like MNIST datasets or Imagenet datasets would make sense to put there.