eth-cscs / firecrest

BSD 3-Clause "New" or "Revised" License
31 stars 8 forks source link

Different System Where My Slurm Resides #176

Closed komraad closed 1 year ago

komraad commented 1 year ago

Can you anyone suggest or help on this.

image

Thank you.

jpdorsch commented 1 year ago

Hi @scavenqqer ,

Could you give us more information on this? for instance, how did you configure FirecREST env to add coare system?

Thanks,

Cheers

komraad commented 1 year ago

Hello @jpdorsch

I change the common.env as per the instruction to test to my cluster.

https://github.com/eth-cscs/firecrest/tree/master/deploy/demo

image

The Dummy Cluster is still up and I did not remove it yet, I just want to connect the other cluster(coare).

Thank you.

jpdorsch commented 1 year ago

Ok, now what that configuration does is add your system to be part of FirecREST. But this doesn' t mean the cluster is reachable via FirecREST. The VM where you installed FirecREST has to be able to reach the cluster via port 22 (SSH). This is configured in the variables:

F7T_SYSTEMS_INTERNAL_COMPUTE='<dns_or_ip_of_cluster>:<port_ssh>;<another_dns_or_ip_of_cluster>:<port_ssh>'
F7T_SYSTEMS_INTERNAL_STORAGE='<dns_or_ip_of_cluster>:<port_ssh>;<another_dns_or_ip_of_cluster>:<port_ssh>'
F7T_SYSTEMS_INTERNAL_UTILITIES='<dns_or_ip_of_cluster>:<port_ssh>;<another_dns_or_ip_of_cluster>:<port_ssh>'

As you can see in the examples, the format is <DNS or IP>:<port where SSH is running>

Hope this helps Cheers

komraad commented 1 year ago

Hello @jpdorsch

Thank you for the response. Actually it is already adjusted.

image

It is pointed to my Cluster IP.

komraad commented 1 year ago

Also, I cannot adjust ssh config of the cluster. So I just created a SSH KEY TYPE RSA in my VM and append the Pub key on the Cluster.

Could there be another solution ? or is it correct already ?

jpdorsch commented 1 year ago

Hello @jpdorsch

Thank you for the response. Actually it is already adjusted.

image

It is pointed to my Cluster IP.

Ok, the configuration is fine, now for this part I will ask you to check if it' s possible to reach port 22 of your cluster from the containers of FirecREST (this is, from the VM where FirecREST is installed).

You can check this executing netcat (for instance) from the VM: nc -nv 202.0.0.0 22 and you should get something like tcp succeded!

komraad commented 1 year ago

Hello @jpdorsch Thank you for the response. Actually it is already adjusted. image It is pointed to my Cluster IP.

Ok, the configuration is fine, now for this part I will ask you to check if it' s possible to reach port 22 of your cluster from the containers of FirecREST (this is, from the VM where FirecREST is installed).

You can check this executing netcat (for instance) from the VM: nc -nv 202.0.0.0 22 and you should get something like tcp succeded!

Whoooooo.

It failed . image

komraad commented 1 year ago

I tried to upload, but this is the error. image

komraad commented 1 year ago

Hello @jpdorsch ,

It still not connected to my cluster after adding the ssh keys. BTW I used RSA type key.

Its also connected to the other cluster.

image

image

Is there any more to change ? Thank you.

jpdorsch commented 1 year ago

Hi @scavenqqer ,

The way that FirecREST handles SSH connections from microservices to the cluster is by using a Certificate Authority microservice (certificator) and making the cluster's SSH configuration trust the certificator signed certificates.

Basically, you have 2 pairs of keys (as you can see in https://github.com/eth-cscs/firecrest/tree/master/deploy/test-build/environment/keys)

You will have to create these 2 pairs, and then mount them in the containers as is in the demo environment.

Now, for the cluster you will need to configure SSH as you can see in the repo for the demo cluster (https://github.com/eth-cscs/firecrest/blob/f9034c71d6651c372f79bd56a65f10395c050d35/deploy/test-build/cluster/ssh/sshd_config#L101)

Match Address 192.168.0.0/16 
    TrustedUserCAKeys /etc/ssh/ca-key.pub
    # only accept this CA, and not regular priv/pub SSH keys
    PubkeyAcceptedKeyTypes ssh-rsa-cert-v01@openssh.com,ssh-ed25519-cert-v01@openssh.com
    PermitRootLogin  no
    #DenyGroups root bin admin sys
    #AllowUsers user1
    MaxAuthTries 1
    AllowTcpForwarding no
    ForceCommand /ssh_command_wrapper.sh
    PermitTTY no
    PermitTunnel no
komraad commented 1 year ago

Hi @scavenqqer ,

The way that FirecREST handles SSH connections from microservices to the cluster is by using a Certificate Authority microservice (certificator) and making the cluster's SSH configuration trust the certificator signed certificates.

Basically, you have 2 pairs of keys (as you can see in https://github.com/eth-cscs/firecrest/tree/master/deploy/test-build/environment/keys)

  • ca_key (private) and ca_key.pub (public)
  • user-key (private) and user-key.pub (public)

You will have to create these 2 pairs, and then mount them in the containers as is in the demo environment.

Now, for the cluster you will need to configure SSH as you can see in the repo for the demo cluster (

https://github.com/eth-cscs/firecrest/blob/f9034c71d6651c372f79bd56a65f10395c050d35/deploy/test-build/cluster/ssh/sshd_config#L101

)

Match Address 192.168.0.0/16 
    TrustedUserCAKeys /etc/ssh/ca-key.pub
    # only accept this CA, and not regular priv/pub SSH keys
    PubkeyAcceptedKeyTypes ssh-rsa-cert-v01@openssh.com,ssh-ed25519-cert-v01@openssh.com
    PermitRootLogin  no
    #DenyGroups root bin admin sys
    #AllowUsers user1
    MaxAuthTries 1
    AllowTcpForwarding no
    ForceCommand /ssh_command_wrapper.sh
    PermitTTY no
    PermitTunnel no
  • In MatchAddress you have to put the IP address or DNS of the VM where the microservices are running.
  • In TrustedUserCaKeys the path to the public key part of the certificate authority machine

If this part is not possible to change on the cluster, since it is using the RSA type keys for connection. How can it integrate with the FirecREST ?

komraad commented 1 year ago

@jpdorsch

I've tried to test it with another system/cluster and with the use of 2 pairs of keys. but it tells invalid path after creating user on that system/cluster.

image