Closed picoroma closed 2 years ago
@picoroma How exactly was your procedure?
Did you first run the buildup of a single node system (hana_count = 1
), changed the parameter to 2
and ran terraform apply
a second time?
--> This is not going to work that easily.
or
Did you run a clean new buildup with hana_count = 2
and hana_ha_enabled = true
?
--> These are the 2 parameters that control the deployment to be HA (>1
and true
).
@yeoldegrove : Usually I destroy a DEPLOY, before run a new one. So, the sequence I run is usually: terraform plan terraform apply terraform destroy and then a new terraform apply (sometimes without a new plan)
When I used: hana_count = 2 and hana_ha_enabled = true I had 2 HANA node installed correctly (with HANA System replication). But I have error on OS Cluster. The SUSE cluster was not deployed correctly. Do I need some othe parameter to setup in AWS ? I have performed another deploy. The erros seems to be related to the monitoring of the HANA cluster. I Attache the latest part of the deploy output with the error SUSE-HA-DEPLOY-ERROR.txt
@picoroma Thanks for the log... In the meantime I was able to reproduce you error.
At the moment I suspect the issue to be related to the change of instance type in https://github.com/SUSE/ha-sap-terraform-deployments/pull/822.
old instance type (xen)
ip-10-0-0-5:~ # python3 -c "from crmsh import utils; print(utils.detect_cloud());"
amazon-web-services
new instance type (nitro/kvm)
ip-10-0-0-5:~ # python3 -c "from crmsh import utils; print(utils.detect_cloud());"
None
The above python code (https://github.com/ClusterLabs/crmsh/blob/347f815c6565d0f8d8d5472a5640cfc1ce78ccb5/crmsh/utils.py#L2054) is used by https://github.com/SUSE/salt-shaptools/blob/835d199a6117b0b5657f14ae8fc296af7709f382/salt/modules/crmshmod.py#L707 and https://github.com/SUSE/salt-shaptools/blob/835d199a6117b0b5657f14ae8fc296af7709f382/salt/states/crmshmod.py#L595 which us again used by e.g. https://github.com/SUSE/saphanabootstrap-formula/blob/038ee4d6b542365e790c47e942efabedc196fa72/templates/cluster_resources.j2#L4 to decide which cloud is used.
As you see above, this code is currently broken... I will try to fix it and/or come up with a workaround. One workaround would be going back to the old instance types... but these were abandoned because we had other issues with these (see PR).
Can I "force" to use temporary the OLD Instance Type ? IN case which Type Of instance I have to choose, for example ? WRONG: hana_instancetype = "r6i.xlarge" RIGHT hana_instancetype = "????" Can I use r5.2xlarge or r5.4xlarge In the meantime a workaroundis provided ?
You could try using the old instance types here: https://github.com/SUSE/ha-sap-terraform-deployments/pull/822/files#diff-c4686714aa47252c9b02d1319b932187b5d7e2182279ecf8f69935a469a3469dL211
e.g. hana_instancetype = r3.8xlarge
but... #822 came for a reason and after a reboot your nodes might not come up.
I proposed a fix here: https://github.com/ClusterLabs/crmsh/pull/952 Let's see how fast we can get this merged.
@picoroma https://github.com/SUSE/salt-shaptools/pull/87 is a workaround that is merged and available with ha_sap_deployment_repo = "https://download.opensuse.org/repositories/network:/ha-clustering:/sap-deployments:/v8"
Please try out if this fixes it for you.
In the meantime we're working to getting the crmsh
fix available in SLES.
Until the 8.0.1 release, you have to use the develop
branch to use the workaround.
NO. I tried again. This times HANA on 2nd node was not not Installed at all. Even the cluster features on 2nd NOTE is missing Hana installed correctly ONLY on 1st node Monitoring Installed but missing exporter on 2nd node
The deploy finish with this error: module.hana_node.module.hana_provision.null_resource.provision[0]: Creation complete after 24m23s [id=656053581]
│ Error: remote-exec provisioner error │ │ with module.hana_node.module.hana_provision.null_resource.provision[1], │ on ..\generic_modules\salt_provisioner\main.tf line 78, in resource "null_resource" "provision": │ 78: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_213658815.sh": Process exited with status 1
On 2nd Note - salt-result.log file reports:
Summary for local
Succeeded: 37 (changed=31) Failed: 0
Total states run: 37 Total run time: 214.288 s Mon Mar 21 08:49:52 UTC 2022::vmhana02::[INFO] predeployment done local: Data failed to compile:
Rendering SLS 'base:hana.monitoring' failed: while constructing a mapping
in "
Perform another attempts with log level = info. + Monitoring disabled
This is the Output of deploy:
module.hana_node.module.hana_provision.null_resource.provision[1] (remote-exec): [ERROR ] b'nameserver vmhana02:30001 not responding.' module.hana_node.module.hana_provision.null_resource.provision[1]: Still creating... [54m44s elapsed] module.hana_node.module.hana_provision.null_resource.provision[1] (remote-exec): [INFO ] b'adding site ...' module.hana_node.module.hana_provision.null_resource.provision[1] (remote-exec): [INFO ] b'collecting information ...' module.hana_node.module.hana_provision.null_resource.provision[1] (remote-exec): [INFO ] b'unable to contact primary site host vmhana01:40002. internal error,location=vmhana01:40002. Trying old-style port (port offset +100)...vmhana01:40002' module.hana_node.module.hana_provision.null_resource.provision[1] (remote-exec): [INFO ] b'unable to contact primary site; to vmhana01:30102; original error: internal error,location=vmhana01:30102; ' module.hana_node.module.hana_provision.null_resource.provision[1] (remote-exec): [INFO ] b'failed. trace file nameserver_vmhana02.00000.000.trc may contain more error details.' module.hana_node.module.hana_provision.null_resource.provision[1] (remote-exec): [ERROR ] b'nameserver vmhana02:30001 not responding.' module.hana_node.module.hana_provision.null_resource.provision[1]: Still creating... [54m54s elapsed]
@picoroma https://github.com/SUSE/ha-sap-terraform-deployments/releases/tag/8.0.1 hast just released, including fixes for this issue. It would be cool if you could confirm that it works now.
Tryed with 2 Nodes with HA enabled and Moinitoring Enabled. Still have error. 2nd Node is not managed. No FileSystem Attached no HANA is installed HANA was installed only on 1st node. The deploy finish with this message:
module.hana_node.module.hana_provision.null_resource.provision[0] (remote-exec): Succeeded: 48 (changed=31) module.hana_node.module.hana_provision.null_resource.provision[0] (remote-exec): Failed: 0 module.hana_node.module.hana_provision.null_resource.provision[0] (remote-exec): ------------- module.hana_node.module.hana_provision.null_resource.provision[0] (remote-exec): Total states run: 48 module.hana_node.module.hana_provision.null_resource.provision[0] (remote-exec): Total run time: 764.523 s module.hana_node.module.hana_provision.null_resource.provision[0] (remote-exec): Wed Mar 23 15:26:41 UTC 2022::vmhana01::[INFO] deployment done module.hana_node.module.hana_provision.null_resource.provision[0]: Creation complete after 25m33s [id=209775395] ╷ │ Error: remote-exec provisioner error │ │ with module.hana_node.module.hana_provision.null_resource.provision[1], │ on ..\generic_modules\salt_provisioner\main.tf line 65, in resource "null_resource" "provision": │ 65: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_971219650.sh": Process exited with status 1
I can send salt*.log file if needed - but i still see error with SUSE Module - for example:
Please check if the URIs defined for this repository are pointing to a valid repository. Skipping repository 'SLE-Product-SLES_SAP15-SP2-Updates' because of the above error. Repository 'SLE-Module-Server-Applications15-SP2-Pool' is invalid. [Server_Applications_Module_x86_64:SLE-Module-Server-Applications15-SP2-Pool|plugin:/susecloud?credentials=Server_Applications_Module_x86_64&path=/repo/SUSE/Products/SLE-Module-Server-Applications/15-SP2/x86_64/product/] Valid metadata not found at specified URL History:
Please check if the URIs defined for this repository are pointing to a valid repository. Skipping repository 'SLE-Module-Server-Applications15-SP2-Pool' because of the above error. Repository 'SLE-Module-Server-Applications15-SP2-Updates' is invalid. [Server_Applications_Module_x86_64:SLE-Module-Server-Applications15-SP2-Updates|plugin:/susecloud?credentials=Server_Applications_Module_x86_64&path=/repo/SUSE/Updates/SLE-Module-Server-Applications/15-SP2/x86_64/update/] Valid metadata not found at specified URL History:
@picoroma Your latest reported issues are most likely related to the SUSEConnect
or registercloudguest
infrastructure (or code).
Which image are you using exactly and is it PAYG or BYOL?
A short test from my side (just now) did not show any issues with os_image = "suse-sles-sap-15-sp2"
(which is the default PAYG image) and aws_region = "us-east-2"
.
Another experience from my side is that it depends on the cloud provider, time of day and availability zone when you hit these kind of issues.
I think can this can be related to the OS Image I'm using that is a BYOL one. I had a similar issue even with AWS Launch Wizard script for SAP. I opened a case to AWS and they says:
I have received an update from our internal team confirming that the problem was due to recent SUSE change in the registration of BYOS AMIs : https://www.suse.com/c/byos-instances-and-the-suse-public-cloud-update-infrastructure/
Our team has informed me that AWS Launch Wizard service will rollout the fix to handle SUSE updates in registration of BYOS AMI’s to all regions by 3/4. I hope that this is helpful.
I do not know if this helps you to troubleshoot. Anyway I will perform a new deploy using ye PAYG OS and give you a feedback
@picoroma Closing this. We're happy to investigate/reopen if you still have this issue.
Used cloud platform AWS
Used SLES4SAP version SLES15SP2
Used client machine OS Windows10
Expected behaviour vs observed behaviour I have deployed with success a single NODE HANA System With or Without Monitoring Option When I ADD a second HANA NODE (hana_count = "2") The installation do not finish. Hana is not installed nor into NODE1 nor in NODE2. If I add even the parameter: "hana_ha_enabled = true" The HANA installation works fine - BUT the HA Cluster is not installed correctly.
My Question Is: What option are mandatory to install a 2 or more HANA NODE without HA Cluster? And What parameters are mandatory for HANA multinode and HA Cluster ?
THX