Open tstaerk opened 2 years ago
Hi Thorsten, could you provide us a bit more information pls.
a) which guide: github or suse getting started docu b) which CSP (guess gce) c) your settings of the various ssh flags in tfvars:
@petersatsuse, Thorsten uses the SBP guide I created for GCP.
@tstaerk, I would advise the following:
git pull
command if you are in doubt before creating the environment. terraform.tfvars
file with us. It contains all the configuration files that you used for your environment. Some other required information:
provisioning_log_level = "info"
option in the terraform.tfvars
file is interesting to get more information during the execution of the terraform commands. So it is suggested to run the deployment with this option to see what happens before opening any ticket.There is the list of the required logs (each of the deployed machines will have all of them):
just realized I did not define a VPC... if there is only one, can't it use this?
OK, I am using GCP and the following tfvars file:
project = "thorstenstaerk-suse-terraforms" gcp_credentials_file = "sa.json" region = "europe-west1" os_image = "suse-sap-cloud/sles-15-sp2-sap" publickey = "/home/admin/.ssh/id_rsa.pub" privatekey = "/home/admin/.ssh/id_rsa" cluster_ssh_pub = "salt://sshkeys/cluster.id_rsa.pub" cluster_ssh_key = "salt://sshkeys/cluster.id_rsa" ha_sap_deployment_repo = "https://download.opensuse.org/repositories/network:/ha-clustering:/sap-deployments:/v8/" provisioning_log_level = "info" pre_deployment = true bastion_enabled = false machine_type = "n1-highmem-16" hana_inst_master="thorstenstaerk-sap-media-extracted/" hana_master_password = "SAP_Pass123"
@tstaerk:
I have just completed a successful deployment using the most recent version, 8.1.0, using the following terraform.tfvars
file:
project = "<PROJECT ID>"
gcp_credentials_file = "sa-key.json"
region = "us-west1"
os_image = "suse-sap-cloud/sles-15-sp2-sap"
public_key = "<PATH TO THE SSH KEY>/gcp_key.pub"
private_key = "<PATH TO THE SSH KEY>/gcp_key"
cluster_ssh_pub = "salt://sshkeys/cluster.id_rsa.pub"
cluster_ssh_key = "salt://sshkeys/cluster.id_rsa"
ha_sap_deployment_repo = "https://download.opensuse.org/repositories/network:ha-clustering:sap-deployments:v8/"
provisioning_log_level = "info"
pre_deployment = true
bastion_enabled = false
hana_inst_master = "<GCP BUCKET>/HANA/2.0/SPS05/51054623"
hana_master_password = "YourSAPPassword1234"
hana_primary_site = "NUE"
hana_secondary_site = "FRA"
I see that we use almost the same configurations. Please ensure that you use the most recent version, 8.1.0, the master branch?
I would suggest using a new clone to ensure no configuration conflicts, or at least execute the command git pull
before starting your deployment?
git pull tells me "already up to date"
Can you please try a fresh clone before digging into the issue?
deleted and re-checked out
OK, your and my terraform.tfvars is identical with the exception of passwords, names and your two lines
hana_primary_site = "NUE" hana_secondary_site = "FRA"
I repeated with my old terraform.tfvars and I get:
module.hana_node.module.hana-load-balancer[0].google_compute_health_check.health-check: Creating... ╷ │ Error: Error creating Network: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/global/networks/demo-network' already exists, alreadyExists │ │ with google_compute_network.ha_network[0], │ on infrastructure.tf line 27, in resource "google_compute_network" "ha_network": │ 27: resource "google_compute_network" "ha_network" { │ ╵ ╷ │ Error: Error creating Disk: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/zones/europe-west1-c/disks/demo-hana-data-1' already exists, alreadyExists │ │ with module.hana_node.google_compute_disk.data[1], │ on modules/hana_node/main.tf line 12, in resource "google_compute_disk" "data": │ 12: resource "google_compute_disk" "data" { │ ╵ ╷ │ Error: Error creating Disk: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/zones/europe-west1-b/disks/demo-hana-data-0' already exists, alreadyExists │ │ with module.hana_node.google_compute_disk.data[0], │ on modules/hana_node/main.tf line 12, in resource "google_compute_disk" "data": │ 12: resource "google_compute_disk" "data" { │ ╵ ╷ │ Error: Error creating Disk: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/zones/europe-west1-b/disks/demo-hana-backup-0' already exists, alreadyExists │ │ with module.hana_node.google_compute_disk.backup[0], │ on modules/hana_node/main.tf line 20, in resource "google_compute_disk" "backup": │ 20: resource "google_compute_disk" "backup" { │ ╵ ╷ │ Error: Error creating Disk: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/zones/europe-west1-c/disks/demo-hana-backup-1' already exists, alreadyExists │ │ with module.hana_node.google_compute_disk.backup[1], │ on modules/hana_node/main.tf line 20, in resource "google_compute_disk" "backup": │ 20: resource "google_compute_disk" "backup" { │ ╵ ╷ │ Error: Error creating Disk: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/zones/europe-west1-c/disks/demo-hana-software-1' already exists, alreadyExists │ │ with module.hana_node.google_compute_disk.hana-software[1], │ on modules/hana_node/main.tf line 28, in resource "google_compute_disk" "hana-software": │ 28: resource "google_compute_disk" "hana-software" { │ ╵ ╷ │ Error: Error creating Disk: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/zones/europe-west1-b/disks/demo-hana-software-0' already exists, alreadyExists │ │ with module.hana_node.google_compute_disk.hana-software[0], │ on modules/hana_node/main.tf line 28, in resource "google_compute_disk" "hana-software": │ 28: resource "google_compute_disk" "hana-software" { │ ╵ ╷ │ Error: Error creating HealthCheck: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/global/healthChecks/demo-hana-health-check' already exists, alreadyExists │ │ with module.hana_node.module.hana-load-balancer[0].google_compute_health_check.health-check, │ on modules/load_balancer/main.tf line 5, in resource "google_compute_health_check" "health-check": │ 5: resource "google_compute_health_check" "health-check" {
after deleting all the stuff above and re-starting terraform apply, I now get:
│ Error: Error creating InstanceGroup: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/zones/europe-west1-b/instanceGroups/demo-hana-primary-group' already exists, alreadyExists │ │ with module.hana_node.google_compute_instance_group.hana-primary-group, │ on modules/hana_node/main.tf line 60, in resource "google_compute_instance_group" "hana-primary-group": │ 60: resource "google_compute_instance_group" "hana-primary-group" { │ ╵ ╷ │ Error: Error creating InstanceGroup: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/zones/europe-west1-c/instanceGroups/demo-hana-secondary-group' already exists, alreadyExists │ │ with module.hana_node.google_compute_instance_group.hana-secondary-group, │ on modules/hana_node/main.tf line 66, in resource "google_compute_instance_group" "hana-secondary-group": │ 66: resource "google_compute_instance_group" "hana-secondary-group" { │ ╵ ╷ │ Error: file provisioner error │ │ with module.hana_node.null_resource.hana_node_provisioner[1], │ on modules/hana_node/salt_provisioner.tf line 23, in resource "null_resource" "hana_node_provisioner": │ 23: provisioner "file" { │ │ timeout - last error: SSH authentication failed (root@35.187.176.254:22): ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no │ supported methods remain ╵ ╷ │ Error: file provisioner error │ │ with module.hana_node.null_resource.hana_node_provisioner[0], │ on modules/hana_node/salt_provisioner.tf line 23, in resource "null_resource" "hana_node_provisioner": │ 23: provisioner "file" { │ │ timeout - last error: SSH authentication failed (root@130.211.104.240:22): ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no │ supported methods remain
ssh cannot work, as Cloud Shell does not have network connection to a host inside a GCP project
130.211.104.240 is demo-vmhana01
@tstaerk, please execute the terraform destroy
command to destroy your environment and before any new attempts to create a new environment using the terraform apply
command.
When you ssh to the HANA node using the public IP address, you need to use the used SSH in the terraform.tfvars
file. Here is the command format:
ssh -i <SSH PRIVATE KEY> root@<HANA_NODE_PUBLIC_IP_ADDRESS>
Hi, I do not call ssh. I get an error that ssh is not possible and I think this is because of the isolation between cloud shell and VMs.
ok, makes sense - you use the public IP address. Here is what I get:
admin_@cloudshell:~$ ssh -i .ssh/id_rsa root@130.211.104.240 The authenticity of host '130.211.104.240 (130.211.104.240)' can't be established. ECDSA key fingerprint is SHA256:YgYUATM68uQX/KEEXAqXUm18U+BMR9/1M1iDic7PfVI. Are you sure you want to continue connecting (yes/no/[fingerprint])? Host key verification failed.
Three possible troubleshooting steps:
-v
option with the SSH command to gather more info.Two questions come to mind:
ok, makes sense - you use the public IP address. Here is what I get:
admin_@cloudshell:~$ ssh -i .ssh/id_rsa root@130.211.104.240 The authenticity of host '130.211.104.240 (130.211.104.240)' can't be established. ECDSA key fingerprint is SHA256:YgYUATM68uQX/KEEXAqXUm18U+BMR9/1M1iDic7PfVI. Are you sure you want to continue connecting (yes/no/[fingerprint])? Host key verification failed.
This is perfectly fine that this fails. Just make sure you delete the old host key from you known_hosts. A bit more context: https://linuxhint.com/host-key-verification-failed-mean/
Two questions come to mind:
- what is salt://sshkeys/cluster.id_rsa.pub? Where does it come from? Can I check if mine is right?
This is the clusters's ssh key. Normally you don't have to temper with this.
- you said it worked for you, and it uses ssh. So you must have a firewall rule, right?
You CAN connect via ssh/port-22 so this will not be a firewall issue.
@tstaerk The ssh keys that are used by terraform to connect via ssh and run salt are these:
public_key = "/home/admin_/.ssh/id_rsa.pub"
private_key = "/home/admin_/.ssh/id_rsa"
Did you create these and are you using these also in your test?
@tstaerk In addition to @yeoldegrove notes and questions, you may manually attach the SSH public keys to your nodes as a troubleshooting step.
added the authorized_keys file manually to both nodes, now the install looks like it's doing sth!
install finished, hdbsql answers my SQL queries. Please make sure the authorized_keys get created automatically!
@tstaerk There is of course already code that handles this https://github.com/SUSE/ha-sap-terraform-deployments/blob/main/gcp/modules/hana_node/main.tf#L155
Are you sure you created the keyfiles and set the correct variables in terraform.tfvars
.
reproducing it now
@yeoldegrove : looking at https://github.com/SUSE/ha-sap-terraform-deployments/blob/main/gcp/modules/hana_node/main.tf#L155, you only add the ssh key to the instance's metadata, so, ssh passwordless login would only work if the project is set to os_login=false, right? Ever tested it with os_login=true?
@tstaerk I still do not get which exact problem you're having and trying to solve. Could you elaborate on that?
ssh keys are added to the instance's metadata the usual way as you pointed out. Are you using the "Cloud Console"? AFAIK most of the users use their workstations to deploy this. https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance#metadata point out that keys added by the Cloud Console will be removed. Maybe this is your issue?
Also, I am not sure what you mean by os_login=true/false
. Where would I set this?
you would go to cloud console, search for "Metadata", select it, and there you set the key os_login and the value false. Then, the ssh key set in https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance#metadata will be respected.
@tstaerk are you talking about https://console.cloud.google.com/compute/metadata where I could set e.g. https://cloud.google.com/compute/docs/oslogin/set-up-oslogin ?
Just that I do not miss anything out... Could you please sum-up what exactly is not working for you (your use case) and how you solve it exactly?
Would just setting enable-oslogin=FALSE
in https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance#metadata fix it for you?
We found the error, we had an organisation policy (constraints/compute.requireOsLogin) active that enforced every project to have enable-oslogin=true.
This led to the ssh error: Also key verification was not the problem:
admin_@cloudshell:~/ha-sap-terraform-deployments/gcp (tstaerk-tf-demo)$ ssh -o StrictHostKeyChecking=no root@34.79.69.80
Warning: Permanently added '34.79.69.80' (ECDSA) to the list of known hosts.
root@34.79.69.80: Permission denied (publickey).
The issue was that the public ssh key was not automatically added to the HANA node's authorized_keys. To change this, we set enable-oslogin=false in the project metadata, see Screenshot:
then, ssh'ing worked and the key could be found in authorized_keys:
admin_@cloudshell:~/ha-sap-terraform-deployments/gcp (tstaerk-tf-demo)$ ssh -o StrictHostKeyChecking=no root@34.79.69.80
SUSE Linux Enterprise Server 15 SP2 for SAP Applications x86_64 (64-bit)
As "root" (sudo or sudo -i) use the:
- zypper command for package management
- yast command for configuration management
Management and Config: https://www.suse.com/suse-in-the-cloud-basics
Documentation: https://www.suse.com/documentation/sles-15/
Community: https://community.suse.com/
Have a lot of fun...
demo-hana02:~ # cat .ssh/authorized_keys
# Added by Google
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDfWjWgE1NkXnmv0UgAkm+zHnJ2UJgTVpMEAlc3Fo+tH6U1BsPL++ceiE+mAAjcT41j7Ew5N4qyranPSTQOvrLSGvCITP4edAJlbrh4JOzy5/aNP/EfWZiprtytrkdBEzd0gbhg+Bh98FlEUoxLtZSFsP2090zI7hTuT9DEB3eQknMkR9g+JsgGcDd0t4kdERaLZp+spkPCJF3LQ2h+9ZbmHqwBjzYLsJLRMma3y+aU80IHONBOEaX+ab+1vR1CuxMBwRjSlDkfRVBuxMWnj+ipQaLjiMLFaGbANFxPFj4AaeDnYO/jnKUaIRQOEAvpgjN9r5hVsRT0I+cpBvTpqcrx admin_@cs-485070161371-default-boost-wds4w
So, one solution would be to manually copy the public ssh key into the OS' authorized_keys file. Another option could be to check if constraints/compute.requireOsLogin is enforced and if yes, tell the user that they have to manually copy the ssh key to all nodes.
Hi @yeoldegrove
thanks for all your contributions here. @ab-mohamed and I really invested a lot of work debugging a "it all boils down to doesn't work" issue. And arrived at a conclusion - if you have a org policy requiring OS Login, you get an error message like in the description. Solution: remove this org policy and enable OS Login. If you cannot do this, manually go to the hana nodes and add the public key to authorized_keys. Would it be possible to document this or implement a respective error message/policy check?
@tstaerk Ok, so this is global setting which is not directly related to this project but gets in the way ...
Could you check if it would be sufficient to set metadata = { enable-oslogin = false, sshKeys = "..." }
here for every compute instance deployed by this project?
It would have to be added to every module that builds up compute instances... like here: https://github.com/SUSE/ha-sap-terraform-deployments/blob/8f3cc1846638d661c3fec3a714cd84de5fd7abf3/gcp/modules/hana_node/main.tf#L154
If this does not work we should definitely write something into the README.md
. A contribution/PR from your side would be appreciated here as your're way more into the topic right now ;)
2-3 sentences with a bit of context in the https://github.com/SUSE/ha-sap-terraform-deployments/tree/main/gcp#troubleshooting section should be enough.
If you have an organization policy that forbids it, you cannot set metadata = { enable-oslogin = false, sshKeys = "..." }
OK, I propose that we add the error message to the documentation and explain how to check if the issue is about the organizational policy. And how to resolve it if you have the Org Policy Admin role.
@tstaerk Do you want to make a PR (would be preferred by me as you're more into the topic) or shall I write something up (and let you review it) ?
I work closely with @ab-mohamed I think we could come up with sth
Following your guide, I get when I type terraform apply:
module.hana_node.null_resource.hana_node_provisioner[1]: Still creating... [5m0s elapsed] ╷ │ Error: file provisioner error │ │ with module.hana_node.null_resource.hana_node_provisioner[1], │ on modules/hana_node/salt_provisioner.tf line 23, in resource "null_resource" "hana_node_provisioner": │ 23: provisioner "file" { │ │ timeout - last error: SSH authentication failed (root@34.140.41.24:22): ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain