COSIMA / cosima-cookbook

Framework for indexing and querying ocean-sea ice model output.
https://cosima-recipes.readthedocs.io/en/latest/
Apache License 2.0
58 stars 25 forks source link

Running notebooks on Raijin? #116

Closed AndyHoggANU closed 4 years ago

AndyHoggANU commented 5 years ago

I know that some time ago, James Munroe had ideas about how we could run notebooks, or perhaps just python code, on Raijin using cookbook functionality.

I think that he would start an interactive job, allocate the right amount of memory, use dask to set up workers and then run his job there.

If we could implement this in the cookbook, it would allow us to do more of the big, or memory intensive, calculations away from the VDI, noting that each individual can only log onto a single VDI node at any time.

I'm not sure how it could be done, but it would be worth looking into at some stage.

jmunroe commented 5 years ago

Today, the best starting point for this would be dask-jobqueue ( https://dask-jobqueue.readthedocs.io/en/latest/) . For those interested, a good starting point would be https://medium.com/pangeo/dask-jobqueue-d7754e42ca53

JAMES MUNROE | ASSOCIATE PROFESSOR

Department of Physics and Physical Oceanography Memorial University of Newfoundland 230 Elizabeth Avenue St. John’s, Newfoundland, Canada A1C 5S7 Chemistry and Physics Building | Room C 4060 T 709 864 7362 | M 709 771 0450

www.physics.mun.ca

On Thu, Jan 17, 2019 at 10:47 AM Andy Hogg notifications@github.com wrote:

I know that some time ago, James Munroe had ideas about how we could run notebooks, or perhaps just python code, on Raijin using cookbook functionality.

I think that he would start an interactive job, allocate the right amount of memory, use dask to set up workers and then run his job there.

If we could implement this in the cookbook, it would allow us to do more of the big, or memory intensive, calculations away from the VDI, noting that each individual can only log onto a single VDI node at any time.

I'm not sure how it could be done, but it would be worth looking into at some stage.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/OceansAus/cosima-cookbook/issues/116, or mute the thread https://github.com/notifications/unsubscribe-auth/AF5Su-wVmy1Olo2-j3Q0sTaNY-shVsFSks5vEIX0gaJpZM4aFa__ .

AndyHoggANU commented 5 years ago

Thanks James - yes, this looks like it could (help to) do the job.

navidcy commented 5 years ago

Here's an example of how I do it (after @jmunroe's help). Perhaps some of the pythonistas in the house (e.g., @angus-g or @aidanheerdegen) can help with automating the procedure? (After a glimpse at the jupyter notebook it'll be clear what precisely I mean by automating...)

The way you do it is as follows:

On a raijin login node you call

[nc3020@raijin1 nc3020]$ jupyter-notebook --no-browser --ip 0.0.0.0
[I 16:20:45.254 NotebookApp] JupyterLab extension loaded from /short/v45/nc3020/miniconda3/envs/mom6analysis/lib/python3.6/site-packages/jupyterlab
[I 16:20:45.254 NotebookApp] JupyterLab application directory is /short/v45/nc3020/miniconda3/envs/mom6analysis/share/jupyter/lab
[I 16:20:45.256 NotebookApp] Serving notebooks from local directory: /g/data1a/v45/nc3020
[I 16:20:45.256 NotebookApp] The Jupyter Notebook is running at:
[I 16:20:45.256 NotebookApp] http://(raijin1 or 127.0.0.1):8888/
[I 16:20:45.256 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

You then open a browser and go to the url mentioned above (which varies depending to which raijin node you are logged in). In the example above: http://raijin1.nci.org.au:8888/

Then see the jupyter notebook here: https://gist.github.com/navidcy/81e0bed4d8485d7c4bb57a13e24299d1

The desk status windows when I computed the mean looked like this:

Screen Shot 2019-05-15 at 4 52 08 pm
navidcy commented 5 years ago

One remark: I can't really see, though, how this issue is related with the cosima-cookbook? The issue raised here by @AndyHoggANU is much more general and goes outside the scope ofcosima-cookbook.

josuemtzmo commented 5 years ago

I've been able to replace the lines: !/sbin/ifconfig cluster = PBSCluster(cores=10, memory='70 GB', ip='10.9.105.1', dashboard_address='192.43.239.21') cluster

by:

import socket import fcntl import struct def get_interface_ip(ifname): s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) return socket.inet_ntoa(fcntl.ioctl( s.fileno(), 0x8915, struct.pack('256s', bytes(ifname[:15], 'utf-8')) )[20:24]) cluster = PBSCluster(cores=10, memory='70 GB', ip=get_interface_ip('ib0'),dashboard_address=get_interface_ip('vlan192')) cluster

This allows you to avoid the copy and paste of IPs. I will look into how to include this in the launching jupyter notebook script.

serazing commented 5 years ago

That's awesome guys. I am currently testing it to compute eddy fluxes with NEMO outputs stored in zarr format. It does run quite better compared to running the same notebook on VDI. Especially, i ran into HDF and Blosc decompression issues on VDI #138 that do not seem to be raised on Raijin nodes.

Is there any effort to compile everything into one script for setting Jupyter on Raijin?

I would also suggest trying JupyterLab instead of Jupyter Notebook. I just changed one line of code in the jupyter_vdi script and it works fine on VDI.

AndyHoggANU commented 5 years ago

Hi @serazing - yes, we have made a couple of half-baked attempts with JupyterLab, but haven't settled on anything. Maybe a way forward here is if you get something working to push your suggestion to the cookbook and @jmunroe can help to implement it?

josuemtzmo commented 5 years ago

Hello,

So far I’ve modified the python script to connect to JVDI to start a session on Raijin. I haven't pull request, but I could submit it in the following days. There still some work to do in order to facilitate the port dashboard selection as there are only six Raijin desks and if a lot of people start using this tool, we will be required to generate random ports for the Xarray-dask dashboard.

The current script can be found at: https://github.com/josuemtzmo/cosima-cookbook/blob/master/scripts/jupyter_raijin.py

I will try to explore the possibility to use JupyterLab.

Additionally, we should try to automatise the required ip's while creating a cluster client. So far @navidcy and I are using:

import socket
import fcntl
import struct
def get_interface_ip(ifname):
    s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    return socket.inet_ntoa(fcntl.ioctl(
            s.fileno(),
            0x8915,
            struct.pack('256s', bytes(ifname[:15], 'utf-8'))
        )[20:24]) 

cluster = PBSCluster(cores=XXX, memory='XXX GB', ip=get_interface_ip('ib0'), dashboard_address=get_interface_ip('vlan192'),localdirectory='XXXX')

But I'm sure we could preload this tiny script while creating a new jupyter raijin session.

Cheers, Josué

jmunroe commented 5 years ago

Rather than each of us running our own jupyter notebook servers and having to worry about not tripping over each other'ss ports (for both jupyter and the dask dashboard) is to use JupyterHub. That tool proxies the network traffic so that everything is through the same port (usually a web port such as :80 or :443) to get to :8888 and :8787 but keeps each user's sessions separate.

cosima-cookbook is fundamentally about helping users analyze model data so this is consistent with the overall goals of the project.

serazing commented 5 years ago

I agree with @jmunroe, JupyterHub seems pretty appropriate for Raijin. This is also what is currently used on different clouds and HPCs by Pangeo members .

Did someone get in touch with the CMS team or NCI people about that? I think it could also be good to interact and provide feedbacks to the Pangeo community (I'm already a member).

aidanheerdegen commented 5 years ago

NCI are looking into providing JupyterHub infrastructure as part of the new Data Enhanced Virtual Laboratory (DEVL) program targeting CMIP6.

@jmunroe has been invited as an alpha tester I believe.

serazing commented 4 years ago

Hi, does anyone tried to run notebooks on Gadi? I can start a Jupyter instance on Gadi but I can't connect to it from a browser.

navidcy commented 4 years ago

Yeap. See cosima cookbook scripts folder.

N

On 14 Dec 2019, at 15:52, Guillaume Sérazin notifications@github.com wrote:

Hi, does anyone tried to run notebooks on Gadi? I can start a Jupyter instance on Gadi but I can't connect to it from a browser.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

navidcy commented 4 years ago

@angus-g I suggest we close this issue. It's been (mostly) addressed.

The solution is to use gadi_jupyter script which sits in https://github.com/coecms/nci_scripts

angus-g commented 4 years ago

I think so, either the script or a future JupyterHub solution. It shouldn't really be specific to the cookbook, anyway.