SUSE / ha-sap-terraform-deployments

Automated SAP/HA Deployments in Public/Private Clouds
GNU General Public License v3.0
122 stars 88 forks source link

SAP Netweaver Deployment not working #670

Closed busetde closed 3 years ago

busetde commented 3 years ago

Used cloud platform GCP

Used SLES4SAP version SLES4SAP-15SP2

Used client machine OS Linux using GCP Cloud Shell

Expected behaviour vs observed behaviour

How to reproduce Specify the step by step process to reproduce the issue. This usually would look like something like this:

  1. Go to GCP folder on the Google Cloud shell and run terraform init and terraform apply
  2. Create the terraform.tfvars file based on terraform.tfvars.example only enabled HANA and Netweaver
  3. Run the next terraform commands: terraform init terraform plan terraform apply -auto-approve

Used terraform.tfvars Paste here the used terraform.tfvars file content. If the file has any secret, change them by dummy information.

Logs Upload the deployment logs to make the root cause finding easier. The logs might have sensitive secrets exposed. Remove them before uploading anything here. Otherwise, contact @arbulu89 to send the logs privately.

diegoakechi commented 3 years ago

Seems related to the DRBD problem mentioned on the other thread. Including @nick-wang on the loop.

busetde commented 3 years ago

Hi @diegoakechi, It's different problem from DRBD I've git clone from sap-blue-horizon branch and DRBD could deploy without the need to change drbd.sls in pillar_examples/automatic/drbd/ But when I enabled to deploy Netweaver there's still error:

nick-wang commented 3 years ago

@busetde Could you please send me(nwang@suse.com) the logs or output for this issue? If you have already send to @arbulu89 , i will ask him when he is online. The test branch is sap-blue-horizon with default automatic pillar files? Any additional repos or other thing configured? I will try deploy first before get the logs.

Thanks!

busetde commented 3 years ago

@nick-wang - let me run it again and send you the log to your email.

nick-wang commented 3 years ago

@busetde Thanks for the logs and information. The DRBD issue did cause by SCC not yet release the latest drbd-formula package, still in 0.4.0. You need to add ha_sap_deployment_repo = "https://download.opensuse.org/repositories/network:/ha-clustering:/sap-deployments:/v7/SLE_15_SP2/" in terraform.tfvars for SLE15SP2 before the latest version available in SCC... sorry

I don't check sap-blue-horizon with NW yet. Will try it as well.

busetde commented 3 years ago

@nick-wang Tested as per your suggestion using master branch and it's working without error for DRBD but still error for Netweaver

Thanks - Budi

nick-wang commented 3 years ago

@busetde I asked @arbulu89 about the NW setup. Based on your log, could you try with absolute path of gcp_credentials_file = /xxx/yyy/... and collect the terraform.txt again?

Can't guarantee this change can fix everything, but the log do raise error with file-provisioner (internal) 2021/04/15 02:36:19 [ERROR] scp stderr: "Sink: C0644 2292 google_credentials.json\n".

busetde commented 3 years ago

@nick-wang - yes need to explicitly put gcp_credentials_file = /xxx/yyy/*.json there. The above is working since now there's other errors on the scp stderr: Have checked the terraform_220125633.sh it's a zero size file.

[TRACE] dag/walk: vertex "provisioner.file (close)" is waiting for "module.drbd_node.module.drbd_provision.null_resource.provision[1]"
remote-exec-provisioner (internal) 2021/04/16 05:59:12 [ERROR] scp stderr: "Sink: C0644 0 terraform_220125633.sh\n
nick-wang commented 3 years ago

@busetde Thanks for the information. I will keep digging tomorrow. So far, i don't have any clue on it. (It is also the first time for me to deploy NW on GCP, it may take some time.)

ab-mohamed commented 3 years ago

Hi Budi,

I would like to have all the stakeholders here on the same page to consolidate our efforts, so please find below points based on my understanding:

  1. You would like to create S/4HANA HA cluster using:
    • SAP HANA HA cluster as a database backend.
    • DRBD HA cluster as a HA NFS service.
      1. The SAP HANA HA cluster deployment was completed successfully.
  2. The DRBD HA cluster deployment was completed successfully.
  3. The SAP S/4HANA HA cluster deployment was failed.
  4. You are using SAP automation v7.0.0 -> master branch.

Please confirm the above-mentioned points or feel free to update them.

Best regards, Ab

busetde commented 3 years ago

Hi Ab,

Thanks for the summary, so have added some more points:

  1. You would like to create S/4HANA HA cluster on Google Cloud (GCP) using:

    • SAP HANA HA (active-active) cluster as a database backend.
    • DRBD HA cluster as a HA NFS service with fencing mechanism="native" as GCP supported
  2. The SAP HANA HA cluster deployment was completed successfully.

  3. The DRBD HA cluster deployment was completed successfully, but need to add ha_sap_deployment_repo = " https://download.opensuse.org/repositories/network:/ha-clustering:/sap-deployments:/v7/SLE_15_SP2/ "

  4. The SAP S/4HANA HA / Netweaver cluster deployment was failed.

  5. You are using SAP automation v7.0.0 -> master branch.

Thanks - Budi

On Mon, Apr 19, 2021 at 3:17 PM Abdelrahman Mohamed < @.***> wrote:

Hi Budi,

I would like to have all the stakeholders here on the same page to consolidate our efforts, so please find below points based on my understanding:

  1. You would like to create S/4HANA HA cluster using:

    • SAP HANA HA cluster as a database backend.
    • DRBD HA cluster as a HA NFS service.
  2. The SAP HANA HA cluster deployment was completed successfully.

  3. The DRBD HA cluster deployment was completed successfully.

  4. The SAP S/4HANA HA cluster deployment was failed.

  5. You are using SAP automation v7.0.0 -> master branch.

Please confirm the above-mentioned points or feel free to update them.

Best regards, Ab

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SUSE/ha-sap-terraform-deployments/issues/670#issuecomment-822271893, or unsubscribe https://github.com/notifications/unsubscribe-auth/APGC26CXA32PDPJKN73CGWLTJPRKTANCNFSM423AQEUQ .

--

Budi Setiawan

• Customer Solutions Consultant

• +62 815 1000 3804 | @.***

• Google Cloud - Indonesia

arbulu89 commented 3 years ago

Hi @busetde , @ab-mohamed has sent me the used tfvars and many things are incorrect if you want to deploy S4/HANA. Please, check here to see how to setup the bucket: https://github.com/SUSE/ha-sap-terraform-deployments/blob/master/doc/sap_software.md#swpm-version-20

And here how to setup tfvars file to S4/HANA HA version: https://github.com/SUSE/ha-sap-terraform-deployments/blob/master/gcp/terraform.tfvars.example#L321

busetde commented 3 years ago

HI @arbulu89 ,

Thanks for the review. I've also in touch with @nick-wang as there's error with DRBD deployment initially but it's good now by adding ha_sap_deployment_repo The above 2 links is what am following as well when setup the bucket and upload the S/4HANA1809 SWPM2, but the challenges is whenever Netweaver deployed there's error:

Since HANA and DRBD deployment using *_cluster_fencing_mechanism="native" is successful, so is the problem will be specific in Netweaver configuration then? Here's the latest config am using but still got error:

netweaver_enabled = true netweaver_app_server_count = 2 netweaver_machine_type = "n1-standard-4" netweaver_os_image = "suse-sap-cloud/sles-15-sp2-sap" netweaver_product_id = "S4HANA1809.CORE.HDB.ABAPHA" netweaver_software_bucket = "sles-iac/s41809" netweaver_swpm_folder = "swpm2" netweaver_sapexe_folder = "kernel" netweaver_ips = ["10.0.0.30", "10.0.0.31", "10.0.0.32", "10.0.0.33"] netweaver_virtual_ips = ["10.0.1.34", "10.0.1.35", "10.0.1.36", "10.0.1.37"] netweaver_sid = "ha1" netweaver_ascs_instance_number = "00" netweaver_ers_instance_number = "10" netweaver_pas_instance_number = "01" netweaver_master_password = "Suse123456" netweaver_ha_enabled = true netweaver_cluster_fencing_mechanism = "native"

Thanks - Budi

arbulu89 commented 3 years ago

Hi @busetde ,

I'm afraid I cannot reproduce the issue and with the current information make any good assumption.

If I play to guess what's going on I would say that for some reason the NW network routes are not being generated and this causes the grains file rendering issue.

Check the next:

  1. Check if you have any sap-nw-ascs-route and sap-nw-ers-route in your routes (assuming you are using sap as workpsace). You whould have the 2 of them
  2. Comment or remove temporarily the next 2 lines in these files: https://github.com/SUSE/ha-sap-terraform-deployments/blob/master/gcp/modules/netweaver_node/salt_provisioner.tf#L46 https://github.com/SUSE/ha-sap-terraform-deployments/blob/master/gcp/modules/netweaver_node/salt_provisioner.tf#L47

Removing those lines makes the deployment fails we can confirm that this is the issue, and that for some reason your GCP account is not letting you create those routes (even though it lets you do the same for hana...) This obviously won't make the deployment success, as commenting these lines will make the cluster cration fail, but it will be an starting point to understand what's happening.

Xabi

busetde commented 3 years ago

Hi @arbulu89,

Thanks for the pointer, I've tested again and now pass the error in regards to Invalid index and able to proceed with netweaver deployment but encounter below error:

Started: 14:57:06.109422
    Duration: 1200008.428 ms
     Changes:   
              ----------
              pid:
                  3687
              retcode:
                  1
              stderr:
              stdout:
                  until nc -zvw5 10.0.1.22 2049;do sleep 30;done : Timed out after 1200 seconds
----------
          ID: wait_before_mount_sapmnt_temporary
    Function: module.run
      Result: False
     Comment: One or more requisite failed: netweaver_node.nfs.wait_until_nfs_is_ready
     Started: 15:17:06.151843
    Duration: 0.008 ms
     Changes:   
----------

Looks like it's in regards to NFS, could you kindly review if there's something wrong with the confing of tfvars variable?

drbd_enabled = true
drbd_machine_type = "n1-standard-4"
drbd_os_image = "suse-sap-cloud/sles-15-sp2-sap"
drbd_data_disk_size = "15"
drbd_data_disk_type = "pd-balanced"
drbd_ips = ["10.0.0.20", "10.0.0.21"]
drbd_cluster_vip = "10.0.1.22"
drbd_cluster_fencing_mechanism = "native"
drbd_nfs_mounting_point = "/mnt_permanent/sapdata/"

netweaver_enabled = true
netweaver_app_server_count = 2
netweaver_machine_type = "n1-standard-4"
netweaver_os_image = "suse-sap-cloud/sles-15-sp2-sap"
netweaver_product_id = "S4HANA1809.CORE.HDB.ABAPHA"
netweaver_software_bucket = "sles-iac/s41809"
netweaver_swpm_folder = "swpm2"
netweaver_sapexe_folder = "kernel"
netweaver_sapcar_exe = "SAPCAR.EXE"
netweaver_ips = ["10.0.0.30", "10.0.0.31", "10.0.0.32", "10.0.0.33"]
netweaver_virtual_ips = ["10.0.1.34", "10.0.1.35", "10.0.1.36", "10.0.1.37"]
netweaver_sid = "ha1"
netweaver_ascs_instance_number = "00"
netweaver_ers_instance_number = "10"
netweaver_pas_instance_number = "01"
netweaver_master_password = "Suse123456"
netweaver_ha_enabled = true
netweaver_cluster_fencing_mechanism = "native"

Thanks - Budi

busetde commented 3 years ago

Hi @arbulu89,

Deploying Netweaver with netweaver_app_server_count = 1 And after checking on the route again (virtual ip) it can progress now. How long will it take to complete the installation and /mnt_permanent/HA1/profile can be created? Below is the log:

module.netweaver_node.module.netweaver_provision.null_resource.provision[1] (remote-exec): [INFO    ] Running state [/mnt_permanent/HA1/profile] at time 21:11:35.343378
module.netweaver_node.module.netweaver_provision.null_resource.provision[1] (remote-exec): [INFO    ] Executing state file.exists for [/mnt_permanent/HA1/profile]
module.netweaver_node.module.netweaver_provision.null_resource.provision[1] (remote-exec): [ERROR   ] Specified path /mnt_permanent/HA1/profile does not exist
module.netweaver_node.module.netweaver_provision.null_resource.provision[1] (remote-exec): [INFO    ] Completed state [/mnt_permanent/HA1/profile] at time 21:11:35.347534 (duration_in_ms=4.156)
module.netweaver_node.module.netweaver_provision.null_resource.provision[1] (remote-exec): [INFO    ] State result does not match retry until value, state will be re-run in 30 seconds
module.netweaver_node.module.netweaver_provision.null_resource.provision[1]: Still creating... [1h2m50s elapsed]
module.netweaver_node.module.netweaver_provision.null_resource.provision[1]: Still creating... [1h3m0s elapsed]
module.netweaver_node.module.netweaver_provision.null_resource.provision[1]: Still creating... [1h3m10s elapsed]

Regards - Budi

arbulu89 commented 3 years ago

Hi @busetde We are progressing bit a bit hehe

This folder is created by the 1st Netweaver installed (ASCS node). If this node installation successes the rest will continue, but if this fails none of the others can continue. Could you check if the netweaver_provision.null_resource.provision[0] installation worked? Here a short documentation about how you can check if something went wrong in the NW or S4/HANA installation: https://github.com/SUSE/ha-sap-terraform-deployments/blob/master/doc/troubleshooting.md#netweaver-debugging (this logs are available in the terraform output too)

busetde commented 3 years ago

Hi @arbulu89 ,

Yes in progress bit by bit. In regards to the route for drbd the floating ip address it's not automatically available for netweaver compute to connect, so need to deployed hana and drbd first, after able to ping the floating ip address, then enable netweaver and apply the terraform again. for the previous error likely solved by changing the netweaver_sapmnt_path = "/mnt_permanent/sapdata"

Right now I'm in the stage below: Is it usually taking long at this stage?

module.netweaver_node.module.netweaver_provision.null_resource.provision[1] (remote-exec): [INFO    ] connecting to SAP HANA database at 10.0.1.12:30015
module.netweaver_node.module.netweaver_provision.null_resource.provision[1]: Still creating... [43m20s elapsed]
module.netweaver_node.module.netweaver_provision.null_resource.provision[1]: Still creating... [43m30s elapsed]
module.netweaver_node.module.netweaver_provision.null_resource.provision[1]: Still creating... [43m40s elapsed]
module.netweaver_node.module.netweaver_provision.null_resource.provision[1] (remote-exec): [INFO    ] connecting to SAP HANA database at 10.0.1.12:30015

Thanks - Budi

arbulu89 commented 3 years ago

Hi @busetde

I don't understand what's going on. This netweaver node shouldn't be trying to connect the HANA database. What do you have in netweaver_app_server_count and netweaver_ha_enabled finally?

It should connect as long as the HANA deployment is finished and you have a HANA cluster running with the virtual IP address on that specific address 10.0.1.12

busetde commented 3 years ago

Hi @arbulu89

I have below on my configuration

netweaver_app_server_count = 1
netweaver_ha_enabled = false

Will the above should works?

Regards - Budi

arbulu89 commented 3 years ago

@busetde Please, don't paste logs without saying if you have done some change, it makes super difficult to follow up things. The above should work. I don't know why the NW machine doesn't find the HANA DB. As I commented, make sure that the HA cluster is working properly on the HANA machines and that it has the virtual ip address with the used address on the NW side.

busetde commented 3 years ago

Hi @arbulu89,

Am using the latest clone which is 7.1 - using this, made good progress that netweaver_provision.null_resource.provision[0] and netweaver_provision.null_resource.provision[1] completed successfully but strangely below the messages from netweaver_provision.null_resource.provision[2] and netweaver_provision.null_resource.provision[3]

image

Thanks - Budi

arbulu89 commented 3 years ago

Hi @busetde , Once again, this is happening because this machines are not able to access the HANA machine. Check this:

  1. Access to any of the HANA machines and run:
    • crm cluster status (most probably with root user). Check if everything looks fine
    • ip a. Check if the 10.0.1.12 address is there (this might be in any of the cluster nodes, where the HANA primary is running)
    • Check if HANA is running. Access with your sid admin user and run HDB info
  2. If this is fine check if the HANA route has been created in the google portal. You need to have a route with 10.0.1.12 address

You have to assure that the HANA db is running and accessible to other machines.

Xabi

busetde commented 3 years ago

Hi @arbulu89

  1. Access to any of the HANA machines and run: Confirm that 3 points below looks fine, attached the screenshot.

image

  1. If this is fine check if the HANA route has been created in the google portal. You need to have a route with 10.0.1.12 address

Confirm in Google Cloud Console there's hana-route

image

Regards - Budi

arbulu89 commented 3 years ago

Hi @busetde , This is becoming too complex to check just with the provided information. I cannot really judge what's going on. I would need to get the whole log set. Could you wait until the whole execution fails and send me the logs from the failed machines? I would need:

Maybe the DB installation part failed before those connection messages... Have you observed some error long on netweaver2 with some installation failure?

ab-mohamed commented 3 years ago

Hi @busetde,

I had a conversation with @arbulu89 regarding the current status. I would suggest the following action plan:

  1. Create a new environment using the same storage bucket you use now.
  2. For the new environment, please wait until the whole deployment completed. For now, it doesn't matter if it is failed. The engineering team needs to reference a specific stage to be able to reproduce the issue you have.
  3. After the deployment failure, please check the SAP landscapes status:
    • First, start by the HANA cluster and confirm the deployment status, either succeeded deployment or failed one. I would suggest using the crm_mon -rnf and SAPHanaSR-showAttr commands.
    • The next action is to check the DRBD HA cluster status and ensure that the NFS shares are exported correctly or not.
    • Finally, check the NetWeaver stack by:
      • Check the ASCS and ERS HA cluster status using crm_mon -rnf command.
      • Check the PAS and AAS servers status.
  4. Please share the terraform.tfvars file and any modified Salt pillars. I would strongly recommend starting your debugging by using the default values as much as you can.
  5. For any failed stack, ie: HANA, DRBD or NetWeaver, please collect the following files:
    • /var/log/salt-os-setup.log
    • /var/log/salt-predeployment.log
    • /var/log/salt-deployment.log
    • /var/log/salt-result.log
    • supportconfig file be executing supportconfig command and collect the output files from each node.
  6. In a separate file, please list the content of your bucket sharing the software versions used for your deployment.

Best regards, Ab

busetde commented 3 years ago

Hi @arbulu89,

Please find attached the log requested for netweaver.provision[02] and netweaver.provision[03]. netweaver04.zip netweaver03.zip

Hi @ab-mohamed,

HANA and DRBD deployment is successful. But NW crm_mon got Failed Resource as below: image

Will prepare your request and send privately via email to you.

Regards - Budi

arbulu89 commented 3 years ago

Hi @busetde ,

Yes, as I guessed the NW db instance installation is failing. I think I have been seen this error before and it is fixed in our development repository;

Could you re-run the deployment changing this in your terraform.tfvars:

ha_sap_deployment_repo = "https://download.opensuse.org/repositories/network:ha-clustering:sap-deployments:devel"

With this you shouldn't have the next error that is displayed in salt-deployment.log multiple times:

2021-04-21 14:44:05,796 [shaptools.shell  :53  ][INFO    ][4496] ERROR      2021-04-21 14:44:02.427 (root/sapinst) (startInstallation) [CSiStepExecute.cpp:1104] id=controller.stepExecuted errno=FCO-00011 CSiStepExecute::execute()
2021-04-21 14:44:05,797 [shaptools.shell  :53  ][INFO    ][4496] The step getDBInfo with step key |NW_ABAP_DB|ind|ind|ind|ind|0|0|NW_GetSidMaybeProfiles|ind|ind|ind|ind|getSid|0|NW_GetSidFromProfilesPartial|ind|ind|ind|ind|havepf|0|NW_getDBInfo|ind|ind|ind|ind|db|0|NW_HDB_getDBInfo|ind|ind|ind|ind|hdb_dbinfo|0|getDBInfo was executed with status ERROR (Last error reported by the step: Caught ESAPinstException in module call: Validator of step '|NW_ABAP_DB|ind|ind|ind|ind|0|0|NW_GetSidMaybeProfiles|ind|ind|ind|ind|getSid|0|NW_GetSidFromProfilesPartial|ind|ind|ind|ind|havepf|0|NW_getDBInfo|ind|ind|ind|ind|db|0|NW_HDB_getDBInfo|ind|ind|ind|ind|hdb_dbinfo|0|getDBInfo' reported an error:
2021-04-21 14:44:05,797 [shaptools.shell  :53  ][INFO    ][4496] Start SAPinst in interactive mode to solve this problem).

Have a look if the execution with this value removes this error in netweaver03

busetde commented 3 years ago

Hi @arbulu89,

Have tried with the ha_sap_deployment_repo = "https://download.opensuse.org/repositories/network:ha-clustering:sap-deployments:devel" configured in terraform.tfvars image

But still notice the same error in salt-deployment.log

Attached is the log: netweaver03_devel.zip

Regards - Budi

arbulu89 commented 3 years ago

Hi @busetde ,

I found this on the sapinst_dev.log file:

Product S4HANA 1809 not supported on SAP HANA version [2.00.030.00.1522209842 (fa/hana2sp03)].

This means that the HANA version you are using is not compatible with S4HANA 1809. You will need to get a compatible version and retry. There is not anything we can do here if the SAP versions are not compatible. The manual installation would fail in the same way.

It's a pity that SAP logs are not that clear and this is only printed in the dev log, which is unfortunate.

ab-mohamed commented 3 years ago

Thank you @arbulu89 for your efforts.

@busetde: I have checked the SAP PAM. HANA 2.0 SPS03 is the min HANA version used with S/4HANA 1809. But you use SLES15 SP2, so I believe that you need to use HANA 2.0 SPS04 revision 48.01 and newer. Please review SAP Note 2235581.

Best regards, Ab

busetde commented 3 years ago

Hi @arbulu89 and @ab-mohamed,

Previous error resolves now with S/4HANA 1809 using HANA 2.0 SPS04. Using: ha_sap_deployment_repo = "https://download.opensuse.org/repositories/network:ha-clustering:sap-deployments:devel" There's another error with netweaver.provision[03] / b-netweaver04 as below: image

Have attached the log for netweaver.provision[02] / b-netweaver03 and netweaver.provision[03] / b-netweaver04

Thanks - Budi netweaver04_hanasps04.zip netweaver03_hanasps04.zip

arbulu89 commented 3 years ago

Hi @busetde , This error might be normal. The AAS tries to install in a repetitive fashion until some 10 retries (by default) fail. And it won't pass until the netweaver03 execution of DB and PAS finished. In the logs of netweaver03 I can see that it is still installing (this is the last piece of the logs at least). So, you will need to wait until netweaver03 is finished. If this ones fails, the netweaver04 will fail most probably.

Please, wait until failure (until terraform completely fails) to see what happens (and upload the log files in this case)

Edit. By the way, the DB and PAS installation on netweaver03 might take a while (some few hours included, depending on the VM size)

busetde commented 3 years ago

Hi @arbulu89 ,

Thanks for your helps, have completed the succesful deployment. May I know what instance Public IP address and what port should I use to connect to the server via SAP GUI?

image

Thanks - Budi

ab-mohamed commented 3 years ago

@busetde, Excellent!

You should have the public IP addresses in the deployment output just after your last screenshot.

Best regards, Ab

busetde commented 3 years ago

Hi @ab-mohamed ,

Understand that I have a list of Public IP, but what instance should I use, am assuming it's netweaver01 as ASCS? Is it correct and which port should I connect to?

Thanks - Budi

busetde commented 3 years ago

Hi @arbulu89 and @ab-mohamed ,

Thanks for the support. Close this issues, as I'm able to deploy it succesfully and logon via SAP GUI.

Thanks - Budi

arbulu89 commented 3 years ago

@busetde Glad to see a happy ending!