KCL-BMEIS / KCL-DGX-cluster-documentation

6 stars 0 forks source link

setup cluster connection bypasses load-balancing #8

Closed mhubii closed 4 months ago

mhubii commented 4 months ago

instructions explain to connect to h1.isd.kcl.ac.uk, see e.g. https://github.com/KCL-BMEIS/KCL-DGX-cluster-documentation/tree/main/1-Setup-cluster-connection#connecting-for-the-first-time

this is not intended by Andrew, who load-balances connections. Users are supposed to connect to aicheadnode.isd.kcl.ac.uk.

@mikelitu, @david-stojanovski

mikelitu commented 4 months ago

This is only for the first connection. I think it is mandatory to connect to the h1 node to reset the password, but we can make this clearer if it is not in the current state. Any suggestion on how to rephrase?

david-stojanovski commented 4 months ago

Maybe adding in a warning like:?

[!NOTE] This is only for the first connection, afterwards use:

mhubii commented 4 months ago

haven't tried the tutorial, however, Andrew wishes users to connect randomly. This is why we should connect to aicheadnode.isd.kcl.ac.uk.

If that is the outcome of the tutorial, then it is fine.

mikelitu commented 4 months ago

Great idea @david-stojanovski!

Thanks for pointing this out @mhubii.

mhubii commented 4 months ago

on another note, why is there not HostName anywhere in the configs: https://github.com/KCL-BMEIS/KCL-DGX-cluster-documentation/blob/580f51e995ce556116c6806ce2fccb55a4280f3f/1-Setup-cluster-connection/config#L1

Really not an expert, but how do we know where to connect to here? For me the configs look along the lines of:

# bouncer server
Host bouncer
    HostName bouncer.isd.kcl.ac.uk
    User mh19
    IdentityFile ~/.ssh/bouncer_key  

## headnode
Host headnode
    HostName aicheadnode.isd.kcl.ac.uk
    User mhuber
    IdentityFile ~/.ssh/headnode
    ProxyCommand ssh -q -W %h:%p bouncer

# headnode local
Host headnode_local
    HostName aicheadnode.isd.kcl.ac.uk
    User mhuber
    IdentityFile ~/.ssh/headnode
mikelitu commented 4 months ago

It is not necessary as we are on the school's network already either connecting via the BH network or the bouncer. But I will add it for consistency with the other examples.

mhubii commented 4 months ago

awesome, thanks both