SUSE / ha-sap-terraform-deployments

Automated SAP/HA Deployments in Public/Private Clouds
GNU General Public License v3.0
122 stars 88 forks source link

S/4HANA HA deployment failure #786

Closed ab-mohamed closed 2 years ago

ab-mohamed commented 2 years ago

Used cloud platform GCP

Used SLES4SAP version SLES15SP2 for SAP Applications

Used client machine OS Google Cloud Shell

Expected behaviour vs observed behaviour Expected behaviour: a successful deployment Observed behavior: failed S/4HANA deployment because of not creating the profile file, /sapmnt/HA1/profile.

How to reproduce

  1. Normal S/4HANA deployment

  2. Create the terraform.tfvars file based on terraform.tfvars.example. I have shared the file internally with the team.

  3. Run the next terraform commands:

    terraform init
    terraform plan
    terraform apply -auto-approve
  4. I noticed the following error messages during the deployment time:

    module.netweaver_node.module.netweaver_provision.null_resource.provision[1] (remote-exec): [ERROR   ] Specified path /sapmnt/HA1/profile does not exist
    [...]
    module.netweaver_node.module.netweaver_provision.null_resource.provision[2] (remote-exec): [ERROR   ] Specified path /sapmnt/HA1/profile does not exist
    [...]
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec): [ERROR   ] Specified path /sapmnt/HA1/profile does not exist
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):      Started: 09:54:08.936631
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):     Duration: 2070274.5809999995 ms
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):      Changes:   
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec): ----------
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):           ID: wait_for_db_HA1_02
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):     Function: hana.available
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):         Name: 10.0.1.12
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):       Result: False
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):      Comment: One or more requisite failed: netweaver.install_aas.check_sapprofile_directory_exists_HA1_02
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):      Started: 10:28:40.860001
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):     Duration: 0.01 ms
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):      Changes:   
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec): ----------
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):           ID: netweaver_install_HA1_02
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):     Function: netweaver.installed
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):         Name: ha1
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):       Result: False
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):      Comment: One or more requisite failed: netweaver.install_aas.wait_for_db_HA1_02
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):      Started: 10:28:40.864854
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):     Duration: 0.007 ms
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):      Changes:  
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec): ----------
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):           ID: sap_host_exporter_service_HA1_AAS02
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):     Function: service.running
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):         Name: prometheus-sap_host_exporter@HA1_AAS02
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):       Result: False
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):      Comment: Service prometheus-sap_host_exporter@HA1_AAS02 has been enabled, and is dead
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):      Started: 10:28:53.087841
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):     Duration: 439.8 ms
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):      Changes:
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):               ----------
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):               prometheus-sap_host_exporter@HA1_AAS02:
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):                   True
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec):
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec): Summary for local
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec): -------------
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec): Succeeded: 41 (changed=36)
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec): Failed:     4
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec): -------------
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec): Total states run:     45
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec): Total run time: 2216.089 s
    module.netweaver_node.module.netweaver_provision.null_resource.provision[3] (remote-exec): Thu Oct 28 10:28:54 UTC 2021::default-netweaver04::[ERROR] deployment failed 
  5. The /sapmnt NFS share was created, but the profile file was not. Following is an example from the first S/4HANA node:

    default-netweaver01:~ # df -hT
    Filesystem               Type      Size  Used Avail Use% Mounted on
    devtmpfs                 devtmpfs  7.4G  8.0K  7.4G   1% /dev
    tmpfs                    tmpfs      13G   54M   13G   1% /dev/shm
    tmpfs                    tmpfs     7.4G  9.2M  7.4G   1% /run
    tmpfs                    tmpfs     7.4G     0  7.4G   0% /sys/fs/cgroup
    /dev/sda3                xfs        10G  6.1G  4.0G  61% /
    /dev/sda2                vfat       20M  6.0M   15M  30% /boot/efi
    tmpfs                    tmpfs     1.5G     0  1.5G   0% /run/user/0
    /dev/sdb1                xfs        60G   34G   26G  57% /sapmedia/NW
    10.0.1.22:/HA1/sapmnt    nfs4       10G   42M   10G   1% /sapmnt
    10.0.1.22:/HA1/usrsapsys nfs4       10G   42M   10G   1% /usr/sap/HA1/SYS
    10.0.1.22:/HA1/ASCS      nfs4       10G   42M   10G   1% /usr/sap/HA1/ASCS00
    10.0.1.22:/HA1/ERS       nfs4       10G   42M   10G   1% /usr/sap/HA1/ERS10
    default-netweaver01:~ # ls -lhar /sapmnt/
    total 0
    drwxr-xr-x 24 root root 328 Oct 28 09:53 ..
    drwxr-xr-x  2 root root   6 Oct 28 09:43 .

Logs I noticed the following messages in the salt-deployment.log log file:

-28 09:53:46,230 [salt.state       :320 ][INFO    ][4690] {'mount': True, 'persist': 'new'}
2021-10-28 09:53:46,230 [salt.state       :2259][INFO    ][4690] Completed state [/usr/sap/HA1/SYS] at time 09:53:46.230622 (duration_in_ms=90.939)
2021-10-28 09:53:46,230 [salt.state       :2065][INFO    ][4690] Running state [/sapmnt/HA1] at time 09:53:46.230966
2021-10-28 09:53:46,232 [salt.state       :2097][INFO    ][4690] Executing state file.absent for [/sapmnt/HA1]
2021-10-28 09:53:46,233 [salt.state       :320 ][INFO    ][4690] File /sapmnt/HA1 is not present
2021-10-28 09:53:46,234 [salt.state       :2259][INFO    ][4690] Completed state [/sapmnt/HA1] at time 09:53:46.234013 (duration_in_ms=3.046)
2021-10-28 09:53:46,234 [salt.state       :2065][INFO    ][4690] Running state [/usr/sap/HA1/SYS] at time 09:53:46.234236
2021-10-28 09:53:46,235 [salt.state       :2097][INFO    ][4690] Executing state file.directory for [/usr/sap/HA1/SYS]
2021-10-28 09:53:46,240 [salt.loaded.int.states.file:628 ][DEBUG   ][4690] Files to keep from required states: []
2021-10-28 09:53:46,241 [salt.state       :320 ][INFO    ][4690] The directory /usr/sap/HA1/SYS is in the correct state
2021-10-28 09:53:46,241 [salt.state       :2259][INFO    ][4690] Completed state [/usr/sap/HA1/SYS] at time 09:53:46.241504 (duration_in_ms=7.268)
2021-10-28 09:53:46,241 [salt.state       :2065][INFO    ][4690] Running state [/usr/sap/HA1/ASCS00] at time 09:53:46.241804
2021-10-28 09:53:46,242 [salt.state       :2097][INFO    ][4690] Executing state file.directory for [/usr/sap/HA1/ASCS00]
2021-10-28 09:53:46,244 [salt.state       :320 ][INFO    ][4690] {'/usr/sap/HA1/ASCS00': 'New Dir'}
2021-10-28 09:53:46,244 [salt.state       :2259][INFO    ][4690] Completed state [/usr/sap/HA1/ASCS00] at time 09:53:46.244895 (duration_in_ms=3.09)
2021-10-28 09:53:46,245 [salt.state       :2065][INFO    ][4690] Running state [/usr/sap/HA1/ASCS00] at time 09:53:46.245131
2021-10-28 09:53:46,246 [salt.state       :2097][INFO    ][4690] Executing state mount.mounted for [/usr/sap/HA1/ASCS00]

These is the list of the required logs (each of the deployed machines will have all of them): The following log files (from the first S/4HANA node) were shared:

ab-mohamed commented 2 years ago

I was able to create the /sapmnt/HA1/profile file as NFS permission is rw:

# cat /proc/mounts | grep sapmnt
10.0.1.22:/HA1/sapmnt /sapmnt nfs4 rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.0.30,local_lock=none,addr=10.0.1.22 0 0
# mkdir /sapmnt/HA1

# touch /sapmnt/HA1/profile

# ls -lh /sapmnt/HA1/profile
-rw-r--r-- 1 root root 0 Oct 28 11:21 /sapmnt/HA1/profile

# df -hT /sapmnt/
Filesystem            Type  Size  Used Avail Use% Mounted on
10.0.1.22:/HA1/sapmnt nfs4   10G   42M   10G   1% /sapmnt
ab-mohamed commented 2 years ago

Quick update. The deployment was failed also when I used the develop branch.

ab-mohamed commented 2 years ago

Hi. Any updates regarding this bug? :)

yeoldegrove commented 2 years ago

After a quick investigation this seems related to a missing terraform.tfvars parameter:

netweaver_sapexe_folder   =  "download_basket"