NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.29k stars 13.54k forks source link

Documentation: jupyter doc & API discussion #269331

Open teto opened 9 months ago

teto commented 9 months ago

Problem

I first thought I could rely on jupyterwith (now jupyenv https://github.com/tweag/jupyenv) but it has been unreliable and having strong (jupyter) foundations in nixpkgs is helpful anyway. Jupyter is so complex, it's the perfect ecosystem for nix to shine (notwithstanding the everevoling python packages).

Proposal

With 23.05 branched off, I wonder if anyone would be willing to start documenting how to create a jupyter notebook with several kernels avaialbe in the nixpkgs documentation ? I am currently trying to build one such notebook so I could prepare a skeleton but I might be short on time. My goal would be to just get the ball rolling so that we have a nice doc for 24.05 .

@thomasjm you might be the most motivated with the quarto project ?

@natsukium @GTrunSec @GaetanLepage

Checklist

Priorities

Add a :+1: reaction to issues you find important.

GTrunSec commented 9 months ago

I embedded Quarto in Jupyenv several months ago, and it works perfectly.

natsukium commented 9 months ago

It has certain limitations that I find challenging to overcome.

What are the limitations? To be honest, I'm a light user of Jupyter, so I can't imagine a complex use case in the real world, but I'm interested in adopting it as much as possible.

I felt in fixing pkgs.jupyter, I would like to unify the API with pkgs.jupyter-console. In any case, it would be nice to refactor them, including the modules, for documentation.

teto commented 9 months ago

I've tried to create a multikernel jupyter experience and this was a bit maddening. Here was my experience writing the following shell

let
    python3PkgsFn = ps: [
    ps.numpy
    ps.scipy
    ps.matplotlib
    ps.graphviz
    ps.tensorflow
    ps.scikit-learn
    ps.numpy
    ps.keras
    ];
    pyEnv = pkgs.python3.withPackages(python3PkgsFn);

    kernel-definitions = {
        python3 = mkPythonKernelDef pyEnv;
        haskell = mkHaskellKernel ghcEnv;
    };

    # creates a folder with kernels/ subpath
    allKernels = pkgs.jupyter-kernel.create {
    definitions = kernel-definitions;

    };

    /* for now must contain ihaskell */
    ghcEnv = pkgs.haskellPackages.ghcWithPackages (p: with p; [
    aeson
    ihaskell
    ihaskell-blaze
    ]);

    /* jupyter console is an attrset of 2 functions 
    */
    jup-console = pkgs.jupyter-console.mkConsole {

    definitions = kernel-definitions;
    # why ?
    kernel = null; 
    };

in
pkgs.mkShell {
    buildInputs = [
    allKernels
    # -notebook
    pkgs.jupyter

    jup-console

    ];

    shellHook = ''
    echo "ohayo "
    echo "${allKernels} "
    export JUPYTER_PATH=${allKernels}
    export IPYTHONDIR=_ipythondir
    export JUPYTER_CONFIG_DIR=_jupyter_cfg
    '';
    };

1/ When I enter the shell without setting the value of IPYTHONDIR, it gets set to /ipython, which triggers:

  jupyter-console 
/nix/store/5s5djlfcb1l655fwkydm8hrsksckqrh3-python3-3.11.6-env/lib/python3.11/site-packages/IPython/paths.py:69: UserWarning: IPython parent '/' is not a writable location, using a temp directory.
  warn("IPython parent '{0}' is not a writable location,"
/nix/store/r4rwxai2g45i3ax51ic7k6flsvhg3yz3-python3-3.11.6-env/bin/python3.11: No module named ipykernel_launcher
^CTraceback (most recent call last):
  File "/nix/store/1rh6y3hlhwj3idgg16149wg0nf5sw8vy-python3.11-jupyter-console-6.6.3/bin/.jupyter-console-wrapped", line 9, in <module>
  sys.exit(main())
  echo $IPYTHONDIR
  /ipython

2/ similarly, if I dont set JUPYTER_CONFIG_DIR I get:

jupyter-console

.py", line 26, in ensure_dir_exists
    os.makedirs(path, mode=mode)
  File "<frozen os>", line 225, in makedirs
PermissionError: [Errno 13] Permission denied: '/jupyter'
$ echo $JUPYTER_CONFIG_DIR 
/jupyter

3/ the all-packages.nix ihaskell derivation is backwards: instead of being haskellPackages.ihaskell, it's a python environment with a ihaskell kernel

4/ similarly, when I entered my shell and ran jupyter kernelspec list --debug it was only listing a python kernel even though my JUPYTER_PATH contained haskell and python. Turns out that the jupyter wrapper sets (no prefix, no suffix) the path to: "--set JUPYTER_PATH ${jupyter-kernel.create { inherit definitions; }}" While discarding the environment JUPYTER_PATH could be defensible from a purity point of view, I found that surprising too. It should have no kernel by default. The API should make it clearer that it embeds kernels. Like jupyter.withKernels() or wrapJupyter jupyter kernels; the jupyter.override { definitions ? defaultDef } is treacherous.

5/ I've got the same feeling about pkgs/applications/editors/jupyter/console.nix, looks it doesn't know what it wants to be. It would be cleaner to have wrapJupyter jupyter-console kernels or jupyter-console.withKernels(kernels)

What I would like to do:

  1. add jupyter/lib.nix with some common functions
  2. add a jupyter/README.md on how to use the API (which we can migrate to the official doc later on)
  3. maybe standardize interpreter wrappers to have a .jupyterKernel attribute ?
  4. I quite like the jupyter.withPackages(...) but to generalize this, I wonder if wrapJupyter jupyter kernels is not more easier. Opinions ?
  5. remove pkgs/applications/editors/jupyter/console.nix ?

As for backwards compability we would throw exceptions with the new way to define the equivalent ?

I am willing to prepare an MR if there is interest/agreement.

teto commented 9 months ago

to sum up, the saner expression is to ignore all the surprising top-level wrappers and use the base packages. This does what I expect in a shell: find the kernel in JUPYTER_PATH (in the final derivation, we would wrap it).

            pkgs.mkShell {

              /* dont use the top-level 'jupyter' as it needs to be overriden with proper kernel-definitions
              */
              buildInputs = [
                allKernels

                pkgs.python3Packages.notebook
                pkgs.python3Packages.jupyter-console
              ];

              shellHook = ''
                echo "ohayo "
                echo "${allKernels} "
                export JUPYTER_PATH=${allKernels}
                export IPYTHONDIR=_ipythondir
                export JUPYTER_CONFIG_DIR=_jupyter_cfg

              '';
              };
thomasjm commented 9 months ago

Hmm, I don't think you should be running into this much trouble making an environment. If you look at the jupyter-all derivation, you can see how this is intended to work at present:

  jupyter-all = jupyter.override {
    definitions = {
      clojure = clojupyter.definition;
      octave = octave-kernel.definition;
      # wolfram = wolfram-for-jupyter-kernel.definition; # unfree
    };
  };

Now, there's a bit of a wrinkle where you're trying to add additional packages to your Python environment. My instinct would be to handle this as part of the Python kernel definition.

My general plan for this stuff has been as follows:

# Jupyter lab with two kernels, each with custom packages
jupyter.withKernels (kernels: [
  kernels.python.withPackages (ps: [ps.matplotlib ps.scipy])
  kernels.haskell.withPackages (ps: [ps.aeson])
])

# Jupyter console with kernel with custom packages
jupyter-console.withKernel jupyter-kernels.python.withPackages (ps: [ps.matplotlib ps.scipy])
teto commented 9 months ago

Hmm, I don't think you should be running into this much trouble making an environment. If you look at the jupyter-all derivation, you can see how this is intended to work at present:

I dont think it should be that complex either. I dont like the use of "override" in your example, in my mind, it's reserved to change things one shouldn't.

I think I agree with all your points. Some packages will have to be inherited from elsewhere, e.g., ihaskell derivation is generated from hackage. If @natsukium agrees let's do that ?

Also to keep track of my hardship/discoveries:

            mkHaskellKernel = ghcEnv: 
              {
            displayName = "Haskell";
            argv = [
              "${ghcEnv}/bin/ihaskell"

              # Without this line I can't import packages from ghcEnv
              # the ihaskell flake does `-l $(${env}/bin/ghc --print-libdir`
              # we  hardcode the (guessed) path instead here to avoid a wrapper
              "-l"       "${ghcEnv}/lib/ghc-${ghcEnv.version}"

              "kernel"
              "{connection_file}"
              "+RTS"
            ];

To sump ihaskell code is hard to use in nixpkgs because of this trick, and the flake implementation is convoluted.

teto commented 9 months ago

and my previous setup listing python packages would work with jupyter-console but not jupyter-notebook because it hit https://github.com/NixOS/nixpkgs/issues/255923 aka my shell had a plain python that couldn't find the files expected by notebook. So I had to create a python environment as done in the PR:

                # I need an env else I hit https://github.com/NixOS/nixpkgs/issues/255923, aka
                # the main python can't find notebook files.
                pyEnv = pkgs.python3.withPackages(ps: [
                  ps.notebook
                  ps.jupyter-console
                ]);

but this would in turn break because the (other) python environment used as my python kernel was missing ipykernel and thus triggered

                #  the kernel launches -m ipykernel_launcher so if you dont add it 
                # you get /nix/store/r4rwxai2g45i3ax51ic7k6flsvhg3yz3-python3-3.11.6-env/bin/python3.11: No module named ipykernel_launcher

. so I end up with this flake that works with jupyter-console and jupyter-notebook with both a python and haskell kernels: https://gist.github.com/teto/4d12998d734f982e27f48d8bb001c8ae

There is no reason it should be this complex except manpower. Now we have a taskforce let's make jupyter+nix enjoyable !

teto commented 8 months ago

@natsukium do you agree with the plan ?

At least with the easy part:

Create pkgs/applications/editors/jupyter-kernels/default.nix, containing a map of all kernel names => kernel definitions. Expose this in all-packages.nix as jupyter-kernels.

I have some cycles to allocate to that. I am interested in adding mKernelForIhaskell functions for instances in jupyter/lib.nix.

It might be interesting to create a #jupyter room in the nixpkgs matrix space, no ?

natsukium commented 8 months ago

Sorry, I missed the notification. The proposed plan seems fine to me. Could you please proceed as such? #jupyter room sounds interesting.

GTrunSec commented 8 months ago

@teto Due to the lack of response from maintainers to my PRs in jupyenv for a long time, I've decided to support this issue thoroughly. My current plan is first to migrate kernels to nixpkgs for unified maintenance

teto commented 8 months ago

thanks to the moderation team, we have a room to discuss implementation details, I've invited all of you https://matrix.to/#/#jupyter:nixos.org