Closed lpalovsky closed 2 years ago
I could reproduce this issue. It seems quite odd as the deployment itself finished successfully:
vmhana01:~ # tail -1 /var/log/salt-result.log
Wed May 25 10:39:34 UTC 2022::vmhana01::[INFO] deployment done
And shortly (30s) after that, something stops the HANA:
prdadm@vmhana01:/usr/sap/PRD/HDB00/vmhana01/trace> cat sapstart.log | grep Stop
(17550) **** 2022/05/25 10:40:05 Caught Signal to Stop all Programs. ****
(17550) Stop Child Process: 17557
Hi, thanks for looking into this. Yes it is strange indeed. I think I saw in the deployment logs multiple restarts of primary DB. I am going to look into those once again and be back once I find anything. Btw. If I run the deployment with: hana_ha_enabled = false It will deploy databases without setting up the HA right? I am thinking to try setup the cluster manually as well.
If I run the deployment with: hana_ha_enabled = false It will deploy databases without setting up the HA right?
correct
I narrowed it down to the package version of SAPHanaSR
.
A deployment with SAPHanaSR-0.154.1-4.14.1
works.
A deployment with SAPHanaSR-0.155.0-4.17.1
fails.
Hmm that is interesting... It might not be related but we are seeing similar issue on non terraform deployment as well. After HANA installation primary node is being demoted and stopped. Secondary is however being promoted and runnning. The difference might be that on terraform side primary manages to fail quicker and secondary is not replicated yet therefore cannot b promoted. We are still looking into it with my colleague, will come back once there i some finding.
I narrowed it down to the package version of
SAPHanaSR
. A deployment withSAPHanaSR-0.154.1-4.14.1
works. A deployment withSAPHanaSR-0.155.0-4.17.1
fails.
I confirm that from my side also for HANA HA on GCP.
More troubleshooting:
location
constraints from the cluster configurations. ab-vmhana01:~ # crm resource refresh rsc_SAPHana_PRD_HDB00 ab-vmhana01
ab-vmhana01
, but the system replication failed:
ab-vmhana02:~ # crm_mon -rnf1
Cluster Summary:
* Stack: corosync
* Current DC: ab-vmhana02 (version 2.0.5+20201202.ba59be712-150300.4.21.1-2.0.5+20201202.ba59be712) - partition with quorum
* Last updated: Wed Jun 1 09:29:00 2022
* Last change: Wed Jun 1 09:25:29 2022 by hacluster via crmd on ab-vmhana01
* 2 nodes configured
* 8 resource instances configured
Node List:
Inactive Resources:
Migration Summary:
5. Stopping and Starting the cluster services did not fix the issue.
I narrowed it down to the package version of
SAPHanaSR
. A deployment withSAPHanaSR-0.154.1-4.14.1
works. A deployment withSAPHanaSR-0.155.0-4.17.1
fails.
Interesting. Is there a bug open for it?
The HANA not coming up at all (rc=7) will be fixed handled in #863.
For the bug about SAPHanaSR-0.155.0-4.17.1
I opened #865
Used cloud platform Azure
Used SLES4SAP version SLES15SP3
Used client machine OS Linux
Expected behaviour vs observed behaviour
ha-terraform-deployment verson: 8.1.0 SAP Hana version: 5.57 , 6.60
Deployment of HA SAP HANA cluster results in non working cluster, Primary database vmhana01 has does not start, vmhana02 is started as replica but without data fully synced.
Starting primary database (vmhana01) manually works and after cleaning up resources everything is back to normal.
in salt-deployment.log I see quite a few messages about cluster not being available:
And something like executing command with incorrect usage:
I experienced the same problem on CGP, haven't tried AWS yet.
How to reproduce Specify the step by step process to reproduce the issue. This usually would look like something like this:
terraform.tfvars
file based onterraform.tfvars.example
The usage of the
provisioning_log_level = "info"
option in theterraform.tfvars
file is interesting to get more information during the terraform commands execution. So it is suggested to run the deployment with this option to see what happens before opening any ticket.Used terraform.tfvars
Logs The logs mentioned below are quite long for both nodes. Should I paste them here or rather send via separate channel?
These is the list of the required logs (each of the deployed machines will have all of them):
Additional logs might be required to deepen the analysis on HANA or NETWEAVER installation. They will be asked specifically in case of need.
Thanks for your time and help!