SUSE / ha-sap-terraform-deployments

Automated SAP/HA Deployments in Public/Private Clouds
GNU General Public License v3.0
122 stars 88 forks source link

HANA cluster unclean after deployment #857

Closed lpalovsky closed 2 years ago

lpalovsky commented 2 years ago

Used cloud platform Azure

Used SLES4SAP version SLES15SP3

Used client machine OS Linux

Expected behaviour vs observed behaviour

ha-terraform-deployment verson: 8.1.0 SAP Hana version: 5.57 , 6.60

Deployment of HA SAP HANA cluster results in non working cluster, Primary database vmhana01 has does not start, vmhana02 is started as replica but without data fully synced.

vmhana01:/home/azureuser # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: vmhana01 (version 2.0.5+20201202.ba59be712-150300.4.21.1-2.0.5+20201202.ba59be712) - partition with quorum
  * Last updated: Wed May 25 08:05:12 2022
  * Last change:  Wed May 25 08:04:16 2022 by root via crm_attribute on vmhana01
  * 2 nodes configured
  * 7 resource instances configured

Node List:
  * Online: [ vmhana01 vmhana02 ]

Full List of Resources:
  * stonith-sbd (stonith:external/sbd):  Started vmhana01
  * Resource Group: g_ip_PRD_HDB00:
    * rsc_ip_PRD_HDB00  (ocf::heartbeat:IPaddr2):    Started vmhana01
    * rsc_socat_PRD_HDB00   (ocf::heartbeat:azure-lb):   Started vmhana01
  * Clone Set: msl_SAPHana_PRD_HDB00 [rsc_SAPHana_PRD_HDB00] (promotable):
    * Slaves: [ vmhana02 ]
    * Stopped: [ vmhana01 ]
  * Clone Set: cln_SAPHanaTopology_PRD_HDB00 [rsc_SAPHanaTopology_PRD_HDB00]:
    * Started: [ vmhana01 vmhana02 ]

Failed Resource Actions:
  * rsc_SAPHana_PRD_HDB00_start_0 on vmhana01 'not running' (7): call=36, status='complete', exitreason='', last-rc-change='2022-05-24 15:08:00Z', queued=0ms, exec=2359ms

Starting primary database (vmhana01) manually works and after cleaning up resources everything is back to normal.

Full List of Resources:
  * stonith-sbd (stonith:external/sbd):  Started vmhana01
  * Resource Group: g_ip_PRD_HDB00:
    * rsc_ip_PRD_HDB00  (ocf::heartbeat:IPaddr2):    Started vmhana01
    * rsc_socat_PRD_HDB00   (ocf::heartbeat:azure-lb):   Started vmhana01
  * Clone Set: msl_SAPHana_PRD_HDB00 [rsc_SAPHana_PRD_HDB00] (promotable):
    * rsc_SAPHana_PRD_HDB00 (ocf::suse:SAPHana):     Master vmhana01 (Monitoring)
    * Slaves: [ vmhana02 ]
  * Clone Set: cln_SAPHanaTopology_PRD_HDB00 [rsc_SAPHanaTopology_PRD_HDB00]:
    * Started: [ vmhana01 vmhana02 ]

in salt-deployment.log I see quite a few messages about cluster not being available:

2022-05-25 08:55:21,003 [salt.loaded.int.module.cmdmod:410 ][INFO    ][3850] Executing command '/usr/sbin/crm' in directory '/root'
2022-05-25 08:55:21,653 [salt.loaded.int.module.cmdmod:847 ][ERROR   ][3850] Command '/usr/sbin/crm' failed with return code: 1
2022-05-25 08:55:21,653 [salt.loaded.int.module.cmdmod:849 ][ERROR   ][3850] stdout: Could not connect to the CIB: Transport endpoint is not connected
crm_mon: Error: cluster is not available on this node
ERROR: status: crm_mon (rc=102):

And something like executing command with incorrect usage:

2022-05-25 08:55:21,654 [salt.loaded.int.module.cmdmod:410 ][INFO    ][3850] Executing command '/usr/sbin/crm' in directory '/root'
2022-05-25 08:55:22,252 [salt.loaded.int.module.cmdmod:849 ][DEBUG   ][3850] stdout: usage: init [options] [STAGE]

Initialize a cluster from scratch. This command configures
a complete cluster, and can also add additional cluster
nodes to the initial one-node cluster using the --nodes
option.

optional arguments:
  -h, --help            Show this help message
  -q, --quiet           Be quiet (don't describe what's happening, just do it)
  -y, --yes             Answer "yes" to all prompts (use with caution, this is
                        destructive, especially those storage related
                        configurations and stages. The /root/.ssh/id_rsa key
                        will be overwritten unless the option "--no-overwrite-
                        sshkey" is used)

I experienced the same problem on CGP, haven't tried AWS yet.

How to reproduce Specify the step by step process to reproduce the issue. This usually would look like something like this:

  1. Create tfvars file ( will be attached)
  2. Create the terraform.tfvars file based on terraform.tfvars.example
  3. Run the next terraform commands: terraform plan terraform apply login to vmhana01 as root, check with: crm status

The usage of the provisioning_log_level = "info" option in the terraform.tfvars file is interesting to get more information during the terraform commands execution. So it is suggested to run the deployment with this option to see what happens before opening any ticket.

Used terraform.tfvars

#################################
# ha-sap-terraform-deployments project configuration file
# Find all the available variables and definitions in the variables.tf file
#################################

# Region where to deploy the configuration
az_region = "westeurope"

# Use an already existing resource group
#resource_group_name = "my-resource-group"

# Use an already existing virtual network
#vnet_name = "my-vnet"

# Use an already existing subnet in this virtual network
#subnet_name = "my-subnet"

# Use an already existing subnet for Netapp in this virtual network (optional, will be created otherwise)
# subnet_netapp_name = "my-netapp-subnet"

# vnet address range in CIDR notation
# Only used if the vnet is created by terraform or the user doesn't have read permissions in this
# resource. To use the current vnet address range set the value to an empty string
# To define custom ranges
#vnet_address_range = "10.74.0.0/16"
#subnet_address_range = "10.74.1.0/24"
#subnet_netapp_address_range = "10.74.3.0/24"
# Or to use already existing address ranges
#vnet_address_range = ""
#subnet_address_range = ""
#subnet_netapp_address_range = ""

#################################
# General configuration variables
#################################

# Deployment name. This variable is used to complement the name of multiple infrastructure resources adding the string as suffix
# If it is not used, the terraform workspace string is used
# The name must be unique among different deployments
deployment_name = "lpalovsky"

# Add the "deployment_name" as a prefix to the hostname.
#deployment_name_in_hostname = false

# Admin user for the created machines
admin_user = "azureuser"

# If BYOS images are used in the deployment, SCC registration code is required. Set `reg_code` and `reg_email` variables below
# By default, all the images are PAYG, so these next parameters are not needed
#reg_code = "<<REG_CODE>>"
#reg_email = "<<your email>>"
reg_code = "DUMMY"

# To add additional modules from SCC. None of them is needed by default
#reg_additional_modules = {
#    "sle-module-adv-systems-management/12/x86_64" = ""
#    "sle-module-containers/12/x86_64" = ""
#    "sle-ha-geo/12.4/x86_64" = "<<REG_CODE>>"
#}

# Default os_image. This value is not used if the specific values are set (e.g.: hana_os_image)
# Run the next command to get the possible options and use the 4th column value (version can be changed by `latest`)
# az vm image list --output table --publisher SUSE --all
# BYOS example with sles4sap 15 sp2 (this value is a pattern, it will select the latest version that matches this name)
#os_image = "SUSE:sles-sap-15-sp2-byos:gen2:latest"

# The project requires a pair of SSH keys (public and private) to provision the machines
# The private key is only used to create the SSH connection, it is not uploaded to the machines
# Besides the provisioning, the SSH connection for this keys will be authorized in the created machines
# These keys are provided using the next two variables in 2 different ways
# Path to already existing keys
public_key  = "~/.ssh/id_rsa.pub"
private_key = "~/.ssh/id_rsa"

# Or provide the content of SSH keys
#public_key  = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCt06V...."
#private_key = <<EOF
#-----BEGIN OPENSSH PRIVATE KEY-----
#b3BlbnNzaC1rZXktdjEAAAAABG5vbmUAAAAEbm9uZQAAAAAAAAABAAABFwAAAAdzc2gtcn
#...
#P9eYliTYFxhv/0E7AAAAEnhhcmJ1bHVAbGludXgtYWZqOQ==
#-----END OPENSSH PRIVATE KEY-----
#EOF

# Authorize additional keys optionally (in this case, the private key is not required)
# Path to local files or keys content
#authorized_keys = ["/home/myuser/.ssh/id_rsa_second_key.pub", "/home/myuser/.ssh/id_rsa_third_key.pub", "ssh-rsa AAAAB3NzaC1yc2EAAAA...."]

# An additional pair of SSH keys is needed to provide the HA cluster the capability to SSH among the machines
# This keys are uploaded to the machines!
# If `pre_deployment = true` is used, this keys are autogenerated
cluster_ssh_pub = "salt://sshkeys/cluster.id_rsa.pub"
cluster_ssh_key = "salt://sshkeys/cluster.id_rsa"

##########################
# Other deployment options
##########################

# Repository url used to install HA/SAP deployment packages
# It contains the salt formulas rpm packages and other dependencies.
#
## Specific Release - for latest release look at https://github.com/SUSE/ha-sap-terraform-deployments/releases
# To auto detect the SLE version
#ha_sap_deployment_repo = "https://download.opensuse.org/repositories/network:ha-clustering:sap-deployments:v8/"
# Otherwise use a specific SLE version:
#ha_sap_deployment_repo = "https://download.opensuse.org/repositories/network:ha-clustering:sap-deployments:v8/SLE_15_SP3/"
#
## Development Release (use if on `develop` branch)
# To auto detect the SLE version
#ha_sap_deployment_repo = "https://download.opensuse.org/repositories/network:ha-clustering:sap-deployments:devel/"
# Otherwise use a specific SLE version:
#ha_sap_deployment_repo = "https://download.opensuse.org/repositories/network:ha-clustering:sap-deployments:devel/SLE_15/"
ha_sap_deployment_repo = "https://download.opensuse.org/repositories/network:/ha-clustering:/sap-deployments:/v8//SLE_15_SP3"

# Provisioning log level (error by default)
provisioning_log_level = "info"

# Print colored output of the provisioning execution (true by default)
provisioning_output_colored = true

# Enable pre deployment steps (disabled by default)
pre_deployment = true

# To disable the provisioning process
#provisioner = ""

# Run provisioner execution in background
background = false

# Test and QA purpose

# Define if the deployment is used for testing purpose
# Disable all extra packages that do not come from the image
# Except salt-minion (for the moment) and salt formulas
# true or false (default)
#offline_mode = false

# Execute HANA Hardware Configuration Check Tool to bench filesystems
# true or false (default)
#hwcct = false

# Variables used with native fencing (azure fence agent)
# Make sure to check out the documentation:
# https://docs.microsoft.com/en-us/azure/virtual-machines/workloads/sap/high-availability-guide-suse-pacemaker#create-azure-fence-agent-stonith-device
# The fencing mechanism has to be defined on a per cluster basis.
# fence_agent_app_id = "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"       # login
# fence_agent_client_secret = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"  # password

##########################
# Bastion (jumpbox) machine variables
##########################

# Enable bastion usage. If this option is enabled, it will create a unique public ip address that is attached to the bastion machine.
# The rest of the machines won't have a public ip address and the SSH connection must be done through the bastion
bastion_enabled = false

# Bastion SSH keys. If they are not set the public_key and private_key are used
#bastion_public_key  = "/home/myuser/.ssh/id_rsa_bastion.pub"
#bastion_private_key = "/home/myuser/.ssh/id_rsa_bastion"

# Bastion machine os image. If it is not provided, the os_image variable data is used
# BYOS example
# bastion_os_image = "SUSE:sles-sap-15-sp2-byos:gen2:latest"

#########################
# HANA machines variables
# This example shows the demo option values.Find more options in the README file
#########################

# Hostname, without the domain part
#hana_name = "vmhana"

# HANA configuration ()
# VM size to use for the cluster nodes
#hana_vm_size = "Standard_E4s_v3"

# Number of nodes in the cluster
# 2 nodes will always be scale-up
# 4+ nodes are needed for scale-out (also set hana_scale_out_enabled=true)
hana_count = "2"

# enable to use HANA scale-out
# hana_scale_out_enabled             = false

# HANA scale-out role assignments (optional, this can be defined automatically based on "hana_scale_out_standby_count")
# see https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.03/en-US/0d9fe701e2214e98ad4f8721f6558c34.html for reference
#hana_scale_out_addhosts = {
#  site1 = "vmhana03:role=standby:group=default:workergroup=default,vmhana05:role=worker:group=default:workergroup=default"
#  site2 = "vmhana04:role=standby:group=default:workergroup=default,vmhana06:role=worker:group=default:workergroup=default"
#}

# HANA scale-out roles
# These role assignments are made per HANA site
# Number of standby nodes per site
#hana_scale_out_standby_count = 1 # default: 1

# majority_maker_vm_size =  "Standard_D2s_v3"
# majority_maker_ip =  "10.74.0.9"

# Instance number for the HANA database. 00 by default.
#hana_instance_number = "00"

# Network options
#hana_enable_accelerated_networking = false

#########################
# shared storage variables
# Needed if HANA is deployed in scale-out scenario
# see https://docs.microsoft.com/en-us/azure/virtual-machines/workloads/sap/hana-vm-operations-netapp
# for reference and minimum requirements
#########################
#hana_scale_out_shared_storage_type = ""      # only anf supported at the moment (default: "")
#anf_pool_size                      = "15"    # min 30TB on Premium, min 15TB on Ultra
#anf_pool_service_level             = "Ultra" # Standard (does not meet KPI), Premium, Ultra
# min requirements Premium
#hana_scale_out_anf_quota_data      = "4000"  # deployed 2x (for each site)
#hana_scale_out_anf_quota_log       = "4000"  # deployed 2x (for each site)  
#hana_scale_out_anf_quota_backup    = "2000"  # deployed 2x (for each site)
#hana_scale_out_anf_quota_shared    = "4000"  # deployed 2x (for each site)
# min requirements Ultra
#hana_scale_out_anf_quota_data      = "2000"  # deployed 2x (for each site)
#hana_scale_out_anf_quota_log       = "2000"  # deployed 2x (for each site)
#hana_scale_out_anf_quota_backup    = "1000"  # deployed 2x (for each site)
#hana_scale_out_anf_quota_shared    = "2000"  # deployed 2x (for each site)

# local disk configuration  - scale-up example
#hana_data_disks_configuration = {
#  disks_type       = "Premium_LRS,Premium_LRS,Premium_LRS,Premium_LRS,Premium_LRS,Premium_LRS"
#  disks_size       = "512,512,512,512,64,1024"
#  caching          = "ReadOnly,ReadOnly,ReadOnly,ReadOnly,ReadOnly,None"
#  writeaccelerator = "false,false,false,false,false,false"
#  luns             = "0,1,2#3#4#5"
#  names            = "datalog#shared#usrsap#backup"
#  lv_sizes         = "70,100#100#100#100"
#  paths            = "/hana/data,/hana/log#/hana/shared#/usr/sap#/hana/backup"
#}

# local Disk configuration - scale-out example
# on scale-out we need shared storage for data/log/backup/shared and fewer local disks
#hana_data_disks_configuration = {
#  disks_type       = "Premium_LRS"
#  disks_size       = "10"
#  caching          = "None"
#  writeaccelerator = "false"
#  # The next variables are used during the provisioning
#  luns        = "0"
#  names       = "usrsap"
#  lv_sizes    = "100"
#  mount_paths = "/usr/sap"
#}

# SLES4SAP image information
# If custom uris are enabled public information will be omitted
# Custom sles4sap image
#sles4sap_uri = "/path/to/your/image"
sles4sap_uri = "https://openqa.blob.core.windows.net/sle-images/SLES15-SP3-SAP-BYOS.x86_64-0.9.10-Azure-Build2.42.vhd"

# Public OS images
# BYOS example
# hana_os_image = "SUSE:sles-sap-15-sp2-byos:gen2:latest"

# The next variables define how the HANA installation software is obtained.
# The installation software must be located in a Azure storage account

# Azure storage account name
storage_account_name = "sapinstmasters"
# Azure storage account secret key (key1 or key2)
storage_account_key = "DUMMY"

# 'hana_inst_master' is a Azure Storage account share where HANA installation files (extracted or not) are stored
# `hana_inst_master` must be used always! It is used as the reference path to the other variables

# Local folder where HANA installation master will be mounted
#hana_inst_folder = "/sapmedia/HANA"

# To configure the usage there are multiple options:
# 1. Use an already extracted HANA Platform folder structure.
# The last numbered folder is the HANA Platform folder with the extracted files with
# something like `HDB:HANA:2.0:LINUX_X86_64:SAP HANA PLATFORM EDITION 2.0::XXXXXX` in the LABEL.ASC file
hana_inst_master = "//sapinstmasters.file.core.windows.net/sapinst/HANA_rev57/HANA_rev57"

# 2. Combine the `hana_inst_master` with `hana_platform_folder` variable.
#hana_inst_master = "//YOUR_STORAGE_ACCOUNT_NAME.file.core.windows.net/sapdata/sap_inst_media"
# Specify the path to already extracted HANA platform installation media, relative to hana_inst_master mounting point.
# This will have preference over hana archive installation media
#hana_platform_folder = "51053381"

# 3. Specify the path to the HANA installation archive file in either of SAR, RAR, ZIP, EXE formats, relative to the 'hana_inst_master' mounting point
# For multipart RAR archives, provide the first part EXE file name.
#hana_archive_file = "51053381_part1.exe"

# 4. If using HANA SAR archive, provide the compatible version of sapcar executable to extract the SAR archive
# HANA installation archives be extracted to path specified at hana_extract_dir (optional, by default /sapmedia/HANA)
#hana_archive_file = "IMDB_SERVER.SAR"
#hana_sapcar_exe = "SAPCAR"

# For option 3 and 4, HANA installation archives are extracted to the path specified
# at hana_extract_dir (optional, by default /sapmedia_extract/HANA). This folder cannot be the same as `hana_inst_folder`!
#hana_extract_dir = "/sapmedia_extract/HANA"

# The following SAP HANA Client variables are needed only when you are using a HANA database SAR archive for HANA installation.
# HANA Client is used by monitoring & cost-optimized scenario and it is already included in HANA platform media unless a HANA database SAR archive is used
# You can provide HANA Client in one of the two options below:
# 1. Path to already extracted hana client folder, relative to hana_inst_master mounting point
#hana_client_folder = "SAP_HANA_CLIENT"
# 2. Or specify the path to the hana client SAR archive file, relative to the 'hana_inst_master'. To extract the SAR archive, you need to also provide compatible version of sapcar executable in variable hana_sapcar_exe
# It will be extracted to hana_client_extract_dir path (optional, by default /sapmedia_extract/HANA_CLIENT)
#hana_client_archive_file = "IMDB_CLIENT20_003_144-80002090.SAR"
#hana_client_extract_dir = "/sapmedia_extract/HANA_CLIENT"

# Enable system replication and HA cluster
hana_ha_enabled = true

# Disable minimal memory checks for HANA. Useful to deploy development clusters.
# Low memory usage can cause a failed deployment. Be aware that this option does
# not work with any memory size and will most likely fail with less than 16 GiB
#hana_ignore_min_mem_check = false

# Each host IP address (sequential order). If it's not set the addresses will be auto generated from the provided vnet address range
hana_ips = ["10.74.1.11", "10.74.1.12"]

# IP address used to configure the hana cluster floating IP. It must belong to the same subnet than the hana machines
hana_cluster_vip = "10.74.1.13"

# Enable Active/Active HANA setup (read-only access in the secondary instance)
#hana_active_active = true

# HANA cluster secondary vip. This IP address is attached to the read-only secondary instance. Only needed if hana_active_active is set to true
#hana_cluster_vip_secondary = "10.74.1.14"

# HANA instance configuration
# Find some references about the variables in:
# https://help.sap.com
# HANA instance system identifier. The system identifier must be composed by 3 uppercase chars/digits string starting always with a character (there are some restricted options).
#hana_sid = "PRD"
# HANA instance number. It's composed of 2 integers string
#hana_instance_number = "00"
# HANA instance master password. The password must contain at least 8 characters, comprising 1 digit, 1 upper-case character, 1 lower-case character and no special characters.
#hana_master_password = "YourPassword1234"
# HANA primary site name. Only used if HANA's system replication feature is enabled (hana_ha_enabled to true)
#hana_primary_site = "Site1"
# HANA secondary site name. Only used if HANA's system replication feature is enabled (hana_ha_enabled to true)
#hana_secondary_site = "Site2"
hana_master_password = "DUMMY"

# Cost optimized scenario
#scenario_type = "cost-optimized"

# fencing mechanism for HANA cluster (Options: sbd [default], native)
# hana_cluster_fencing_mechanism = "sbd"

#######################
# SBD related variables
#######################

# In order to enable SBD, an ISCSI server is needed as right now is the only option
# All the clusters will use the same mechanism

# Hostname, without the domain part
#iscsi_name = "vmiscsi"

# Custom iscsi server image
#iscsi_srv_uri = "/path/to/your/iscsi/image"
iscsi_srv_uri = "https://openqa.blob.core.windows.net/sle-images/SLES15-SP3-SAP-BYOS.x86_64-0.9.10-Azure-Build2.42.vhd"

# Public image usage for iSCSI. BYOS example
#iscsi_os_image = "SUSE:sles-sap-15-sp2-byos:gen2:latest"

# IP address of the iSCSI server. If it's not set the address will be auto generated from the provided vnet address range
#iscsi_srv_ip = "10.74.1.14"
# Number of LUN (logical units) to serve with the iscsi server. Each LUN can be used as a unique sbd disk
#iscsi_lun_count = 3
# Disk size in GB used to create the LUNs and partitions to be served by the ISCSI service
#iscsi_disk_size = 10

##############################
# Monitoring related variables
##############################

# Custom monitoring server image
#monitoring_uri = "/path/to/your/monitoring/image"

# Public image usage for the monitoring server. BYOS example
#monitoring_os_image = "SUSE:sles-sap-15-sp2-byos:gen2:latest"

# Enable the host to be monitored by exporters
#monitoring_enabled = true

# Hostname, without the domain part
#monitoring_name = "vmmonitoring"

# IP address of the machine where Prometheus and Grafana are running. If it's not set the address will be auto generated from the provided vnet address range
#monitoring_srv_ip = "10.74.1.13"

########################
# DRBD related variables
########################

# Enable drbd cluster
#drbd_enabled = true

# Hostname, without the domain part
#drbd_name = "vmdrbd"

# Custom drbd nodes image
#drbd_image_uri = "/path/to/your/monitoring/image"

# Public image usage for the DRBD machines. BYOS example
#drbd_os_image = "SUSE:sles-sap-15-sp2-byos:gen2:latest"

# Each drbd cluster host IP address (sequential order). If it's not set the addresses will be auto generated from the provided vnet address range
#drbd_ips = ["10.74.1.21", "10.74.1.22"]
#drbd_cluster_vip = "10.74.1.23"

# NFS share mounting point and export. Warning: Since cloud images are using cloud-init, /mnt folder cannot be used as standard mounting point folder
# It will create the NFS export in /mnt_permanent/sapdata/{netweaver_sid} to be connected as {drbd_cluster_vip}:/{netwaever_sid} (e.g.: )192.168.1.20:/HA1
#drbd_nfs_mounting_point = "/mnt_permanent/sapdata"

# fencing mechanism for DRBD cluster (Options: sbd [default], native)
# drbd_cluster_fencing_mechanism = "sbd"

#############################
# Netweaver related variables
#############################

#netweaver_enabled = true

# Hostname, without the domain part
#netweaver_name = "vmnetweaver"

# Netweaver APP server count (PAS and AAS)
# Set to 0 to install the PAS instance in the same instance as the ASCS. This means only 1 machine is installed in the deployment (2 if HA capabilities are enabled)
# Set to 1 to only enable 1 PAS instance in an additional machine`
# Set to 2 or higher to deploy additional AAS instances in new machines
#netweaver_app_server_count = 2

# Custom drbd nodes image
#netweaver_image_uri = "/path/to/your/monitoring/image"

# Public image usage for the Netweaver machines. BYOS example
#netweaver_os_image = "SUSE:sles-sap-15-sp2-byos:gen2:latest"

# If the addresses are not set they will be auto generated from the provided vnet address range
#netweaver_ips = ["10.74.1.30", "10.74.1.31", "10.74.1.32", "10.74.1.33"]
#netweaver_virtual_ips = ["10.74.1.35", "10.74.1.36", "10.74.1.37", "10.74.1.38"]

# Netweaver installation configuration
# Netweaver system identifier. The system identifier must be composed by 3 uppercase chars/digits string starting always with a character (there are some restricted options)
#netweaver_sid = "HA1"
# Netweaver ASCS instance number. It's composed of 2 integers string
#netweaver_ascs_instance_number = "00"
# Netweaver ERS instance number. It's composed of 2 integers string
#netweaver_ers_instance_number = "10"
# Netweaver PAS instance number. If additional AAS machines are deployed, they get the next number starting from the PAS instance number. It's composed of 2 integers string
#netweaver_pas_instance_number = "01"
# NetWeaver or S/4HANA master password. 
# It must follow the SAP Password policies such as having 8 - 14 characters for NetWeaver or 10 - 14 characters for S/4HANA.
# It cannot start with special characters and must contain a combination of
# upper and lower case characters and numbers (Invalid characters are backslash and double quote).
#netweaver_master_password = "SuSE1234"

# Enabling this option will create a ASCS/ERS HA available cluster
#netweaver_ha_enabled = true

# VM sizes
#netweaver_xscs_vm_size = Standard_D2s_v3
#netweaver_app_vm_size = Standard_D2s_v3

# fencing mechanism for Neteaver cluster (Options: sbd [default], native)
# netweaver_cluster_fencing_mechanism = "sbd"

# Set the Netweaver product id. The 'HA' sufix means that the installation uses an ASCS/ERS cluster
# Below are the supported SAP Netweaver product ids if using SWPM version 1.0:
# - NW750.HDB.ABAP
# - NW750.HDB.ABAPHA
# - S4HANA1709.CORE.HDB.ABAP
# - S4HANA1709.CORE.HDB.ABAPHA
# Below are the supported SAP Netweaver product ids if using SWPM version 2.0:
# - S4HANA1809.CORE.HDB.ABAP
# - S4HANA1809.CORE.HDB.ABAPHA
# - S4HANA1909.CORE.HDB.ABAP
# - S4HANA1909.CORE.HDB.ABAPHA
# - S4HANA2020.CORE.HDB.ABAP
# - S4HANA2020.CORE.HDB.ABAPHA
# - S4HANA2021.CORE.HDB.ABAP
# - S4HANA2021.CORE.HDB.ABAPHA

# Example:
#netweaver_product_id = "NW750.HDB.ABAPHA"

#########################
# Netweaver shared storage variables
# Needed if Netweaver is deployed HA
# see https://docs.microsoft.com/en-us/azure/virtual-machines/workloads/sap/hana-vm-operations-netapp
# for reference and minimum requirements
#########################
#netweaver_shared_storage_type      = "drbd"  # drbd,anf supported at the moment (default: "drbd")
#anf_pool_size                      = "15"    # min 30TB on Premium, min 15TB on Ultra -> only set once for Netweaver+HANA
#anf_pool_service_level             = "Ultra" # Standard (does not meet KPI), Premium, Ultra -> only set once for Netweaver+HANA
# min requirements Premium
#netweaver_anf_quota_sapmnt         = "2000"  # deployed 1x
# min requirements Ultra
#netweaver_anf_quota_sapmnt         = "1000"  # deployed 1x

# NFS share to store the Netweaver shared files. Only used if drbd_enabled is not set. For single machine deployments (ASCS and PAS in the same machine) set an empty string
#netweaver_nfs_share = "url-to-your-netweaver-sapmnt-nfs-share"

# Path where netweaver sapmnt data is stored.
#netweaver_sapmnt_path = "/sapmnt"

# Preparing the Netweaver download basket. Check `doc/sap_software.md` for more information

# Azure storage account where all the Netweaver software is available. The next paths are relative to this folder.
#netweaver_storage_account_key = "YOUR_STORAGE_ACCOUNT_KEY"
#netweaver_storage_account_name = "YOUR_STORAGE_ACCOUNT_NAME"
#netweaver_storage_account = "//YOUR_STORAGE_ACCOUNT_NAME.file.core.windows.net/path/to/your/nw/installation/master"

# Netweaver installation required folders
# SAP SWPM installation folder, relative to the netweaver_storage_account mounting point
#netweaver_swpm_folder     =  "your_swpm"
# Or specify the path to the sapcar executable & SWPM installer sar archive, relative to the netweaver_storage_account mounting point
# The sar archive will be extracted to path specified at netweaver_extract_dir under SWPM directory (optional, by default /sapmedia_extract/NW/SWPM)
#netweaver_sapcar_exe = "your_sapcar_exe_file_path"
#netweaver_swpm_sar = "your_swpm_sar_file_path"
# Folder where needed SAR executables (sapexe, sapdbexe) are stored, relative to the netweaver_storage_account mounting point
#netweaver_sapexe_folder   =  "download_basket"
# Additional media archives or folders (added in start_dir.cd), relative to the netweaver_storage_account mounting point
#netweaver_additional_dvds = ["dvd1", "dvd2"]

Logs The logs mentioned below are quite long for both nodes. Should I paste them here or rather send via separate channel?

These is the list of the required logs (each of the deployed machines will have all of them):

Additional logs might be required to deepen the analysis on HANA or NETWEAVER installation. They will be asked specifically in case of need.

Thanks for your time and help!

yeoldegrove commented 2 years ago

I could reproduce this issue. It seems quite odd as the deployment itself finished successfully:

vmhana01:~ # tail -1 /var/log/salt-result.log
Wed May 25 10:39:34 UTC 2022::vmhana01::[INFO] deployment done

And shortly (30s) after that, something stops the HANA:

prdadm@vmhana01:/usr/sap/PRD/HDB00/vmhana01/trace> cat sapstart.log | grep Stop
(17550) **** 2022/05/25 10:40:05 Caught Signal to Stop all Programs. ****
(17550) Stop Child Process: 17557
lpalovsky commented 2 years ago

Hi, thanks for looking into this. Yes it is strange indeed. I think I saw in the deployment logs multiple restarts of primary DB. I am going to look into those once again and be back once I find anything. Btw. If I run the deployment with: hana_ha_enabled = false It will deploy databases without setting up the HA right? I am thinking to try setup the cluster manually as well.

yeoldegrove commented 2 years ago

If I run the deployment with: hana_ha_enabled = false It will deploy databases without setting up the HA right?

correct

yeoldegrove commented 2 years ago

I narrowed it down to the package version of SAPHanaSR. A deployment with SAPHanaSR-0.154.1-4.14.1 works. A deployment with SAPHanaSR-0.155.0-4.17.1 fails.

lpalovsky commented 2 years ago

Hmm that is interesting... It might not be related but we are seeing similar issue on non terraform deployment as well. After HANA installation primary node is being demoted and stopped. Secondary is however being promoted and runnning. The difference might be that on terraform side primary manages to fail quicker and secondary is not replicated yet therefore cannot b promoted. We are still looking into it with my colleague, will come back once there i some finding.

ab-mohamed commented 2 years ago

I narrowed it down to the package version of SAPHanaSR. A deployment with SAPHanaSR-0.154.1-4.14.1 works. A deployment with SAPHanaSR-0.155.0-4.17.1 fails.

I confirm that from my side also for HANA HA on GCP.

ab-mohamed commented 2 years ago

More troubleshooting:

  1. I removed the two location constraints from the cluster configurations.
  2. I stopped the DB on the secondary node (ab-vmhana01)
  3. I refreshed the failed resource:
    ab-vmhana01:~ # crm resource refresh rsc_SAPHana_PRD_HDB00 ab-vmhana01
  4. DB started on ab-vmhana01, but the system replication failed:
    
    ab-vmhana02:~ # crm_mon -rnf1
    Cluster Summary:
    * Stack: corosync
    * Current DC: ab-vmhana02 (version 2.0.5+20201202.ba59be712-150300.4.21.1-2.0.5+20201202.ba59be712) - partition with quorum
    * Last updated: Wed Jun  1 09:29:00 2022
    * Last change:  Wed Jun  1 09:25:29 2022 by hacluster via crmd on ab-vmhana01
    * 2 nodes configured
    * 8 resource instances configured

Node List:

Inactive Resources:

Migration Summary:


5.  Stopping and  Starting the cluster services did not fix the issue. 
ricardobranco777 commented 2 years ago

I narrowed it down to the package version of SAPHanaSR. A deployment with SAPHanaSR-0.154.1-4.14.1 works. A deployment with SAPHanaSR-0.155.0-4.17.1 fails.

Interesting. Is there a bug open for it?

yeoldegrove commented 2 years ago

The HANA not coming up at all (rc=7) will be fixed handled in #863.

For the bug about SAPHanaSR-0.155.0-4.17.1 I opened #865