Closed busetde closed 3 years ago
Seems related to the DRBD problem mentioned on the other thread. Including @nick-wang on the loop.
Hi @diegoakechi, It's different problem from DRBD I've git clone from sap-blue-horizon branch and DRBD could deploy without the need to change drbd.sls in pillar_examples/automatic/drbd/ But when I enabled to deploy Netweaver there's still error:
@busetde Could you please send me(nwang@suse.com) the logs or output for this issue? If you have already send to @arbulu89 , i will ask him when he is online.
The test branch is sap-blue-horizon
with default automatic pillar files? Any additional repos or other thing configured?
I will try deploy first before get the logs.
Thanks!
@nick-wang - let me run it again and send you the log to your email.
@busetde Thanks for the logs and information. The DRBD issue did cause by SCC not yet release the latest drbd-formula package, still in 0.4.0
. You need to add ha_sap_deployment_repo = "https://download.opensuse.org/repositories/network:/ha-clustering:/sap-deployments:/v7/SLE_15_SP2/"
in terraform.tfvars for SLE15SP2 before the latest version available in SCC... sorry
I don't check sap-blue-horizon
with NW yet. Will try it as well.
@nick-wang
Tested as per your suggestion using master branch
and it's working without error for DRBD but still error for Netweaver
Thanks - Budi
@busetde I asked @arbulu89 about the NW setup. Based on your log, could you try with absolute path of gcp_credentials_file = /xxx/yyy/...
and collect the terraform.txt
again?
Can't guarantee this change can fix everything, but the log do raise error with file-provisioner (internal) 2021/04/15 02:36:19 [ERROR] scp stderr: "Sink: C0644 2292 google_credentials.json\n"
.
@nick-wang - yes need to explicitly put gcp_credentials_file = /xxx/yyy/*.json
there.
The above is working since now there's other errors on the scp stderr:
Have checked the terraform_220125633.sh
it's a zero size file.
[TRACE] dag/walk: vertex "provisioner.file (close)" is waiting for "module.drbd_node.module.drbd_provision.null_resource.provision[1]"
remote-exec-provisioner (internal) 2021/04/16 05:59:12 [ERROR] scp stderr: "Sink: C0644 0 terraform_220125633.sh\n
@busetde Thanks for the information. I will keep digging tomorrow. So far, i don't have any clue on it. (It is also the first time for me to deploy NW on GCP, it may take some time.)
Hi Budi,
I would like to have all the stakeholders here on the same page to consolidate our efforts, so please find below points based on my understanding:
Please confirm the above-mentioned points or feel free to update them.
Best regards, Ab
Hi Ab,
Thanks for the summary, so have added some more points:
You would like to create S/4HANA HA cluster on Google Cloud (GCP) using:
The SAP HANA HA cluster deployment was completed successfully.
The DRBD HA cluster deployment was completed successfully, but need to add ha_sap_deployment_repo = " https://download.opensuse.org/repositories/network:/ha-clustering:/sap-deployments:/v7/SLE_15_SP2/ "
The SAP S/4HANA HA / Netweaver cluster deployment was failed.
You are using SAP automation v7.0.0 -> master branch.
Thanks - Budi
On Mon, Apr 19, 2021 at 3:17 PM Abdelrahman Mohamed < @.***> wrote:
Hi Budi,
I would like to have all the stakeholders here on the same page to consolidate our efforts, so please find below points based on my understanding:
You would like to create S/4HANA HA cluster using:
- SAP HANA HA cluster as a database backend.
- DRBD HA cluster as a HA NFS service.
The SAP HANA HA cluster deployment was completed successfully.
The DRBD HA cluster deployment was completed successfully.
The SAP S/4HANA HA cluster deployment was failed.
You are using SAP automation v7.0.0 -> master branch.
Please confirm the above-mentioned points or feel free to update them.
Best regards, Ab
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SUSE/ha-sap-terraform-deployments/issues/670#issuecomment-822271893, or unsubscribe https://github.com/notifications/unsubscribe-auth/APGC26CXA32PDPJKN73CGWLTJPRKTANCNFSM423AQEUQ .
--
Budi Setiawan
• Customer Solutions Consultant
• +62 815 1000 3804 | @.***
• Google Cloud - Indonesia
Hi @busetde ,
@ab-mohamed has sent me the used tfvars
and many things are incorrect if you want to deploy S4/HANA.
Please, check here to see how to setup the bucket: https://github.com/SUSE/ha-sap-terraform-deployments/blob/master/doc/sap_software.md#swpm-version-20
And here how to setup tfvars file to S4/HANA HA version: https://github.com/SUSE/ha-sap-terraform-deployments/blob/master/gcp/terraform.tfvars.example#L321
HI @arbulu89 ,
Thanks for the review. I've also in touch with @nick-wang as there's error with DRBD deployment initially but it's good now by adding ha_sap_deployment_repo The above 2 links is what am following as well when setup the bucket and upload the S/4HANA1809 SWPM2, but the challenges is whenever Netweaver deployed there's error:
Since HANA and DRBD deployment using *_cluster_fencing_mechanism="native" is successful, so is the problem will be specific in Netweaver configuration then? Here's the latest config am using but still got error:
netweaver_enabled = true netweaver_app_server_count = 2 netweaver_machine_type = "n1-standard-4" netweaver_os_image = "suse-sap-cloud/sles-15-sp2-sap" netweaver_product_id = "S4HANA1809.CORE.HDB.ABAPHA" netweaver_software_bucket = "sles-iac/s41809" netweaver_swpm_folder = "swpm2" netweaver_sapexe_folder = "kernel" netweaver_ips = ["10.0.0.30", "10.0.0.31", "10.0.0.32", "10.0.0.33"] netweaver_virtual_ips = ["10.0.1.34", "10.0.1.35", "10.0.1.36", "10.0.1.37"] netweaver_sid = "ha1" netweaver_ascs_instance_number = "00" netweaver_ers_instance_number = "10" netweaver_pas_instance_number = "01" netweaver_master_password = "Suse123456" netweaver_ha_enabled = true netweaver_cluster_fencing_mechanism = "native"
Thanks - Budi
Hi @busetde ,
I'm afraid I cannot reproduce the issue and with the current information make any good assumption.
If I play to guess what's going on I would say that for some reason the NW network routes are not being generated and this causes the grains file rendering issue.
Check the next:
sap-nw-ascs-route
and sap-nw-ers-route
in your routes (assuming you are using sap
as workpsace). You whould have the 2 of themRemoving those lines makes the deployment fails we can confirm that this is the issue, and that for some reason your GCP account is not letting you create those routes (even though it lets you do the same for hana...) This obviously won't make the deployment success, as commenting these lines will make the cluster cration fail, but it will be an starting point to understand what's happening.
Xabi
Hi @arbulu89,
Thanks for the pointer, I've tested again and now pass the error in regards to Invalid index and able to proceed with netweaver deployment but encounter below error:
Started: 14:57:06.109422
Duration: 1200008.428 ms
Changes:
----------
pid:
3687
retcode:
1
stderr:
stdout:
until nc -zvw5 10.0.1.22 2049;do sleep 30;done : Timed out after 1200 seconds
----------
ID: wait_before_mount_sapmnt_temporary
Function: module.run
Result: False
Comment: One or more requisite failed: netweaver_node.nfs.wait_until_nfs_is_ready
Started: 15:17:06.151843
Duration: 0.008 ms
Changes:
----------
Looks like it's in regards to NFS, could you kindly review if there's something wrong with the confing of tfvars variable?
drbd_enabled = true
drbd_machine_type = "n1-standard-4"
drbd_os_image = "suse-sap-cloud/sles-15-sp2-sap"
drbd_data_disk_size = "15"
drbd_data_disk_type = "pd-balanced"
drbd_ips = ["10.0.0.20", "10.0.0.21"]
drbd_cluster_vip = "10.0.1.22"
drbd_cluster_fencing_mechanism = "native"
drbd_nfs_mounting_point = "/mnt_permanent/sapdata/"
netweaver_enabled = true
netweaver_app_server_count = 2
netweaver_machine_type = "n1-standard-4"
netweaver_os_image = "suse-sap-cloud/sles-15-sp2-sap"
netweaver_product_id = "S4HANA1809.CORE.HDB.ABAPHA"
netweaver_software_bucket = "sles-iac/s41809"
netweaver_swpm_folder = "swpm2"
netweaver_sapexe_folder = "kernel"
netweaver_sapcar_exe = "SAPCAR.EXE"
netweaver_ips = ["10.0.0.30", "10.0.0.31", "10.0.0.32", "10.0.0.33"]
netweaver_virtual_ips = ["10.0.1.34", "10.0.1.35", "10.0.1.36", "10.0.1.37"]
netweaver_sid = "ha1"
netweaver_ascs_instance_number = "00"
netweaver_ers_instance_number = "10"
netweaver_pas_instance_number = "01"
netweaver_master_password = "Suse123456"
netweaver_ha_enabled = true
netweaver_cluster_fencing_mechanism = "native"
Thanks - Budi
Hi @arbulu89,
Deploying Netweaver with netweaver_app_server_count = 1 And after checking on the route again (virtual ip) it can progress now. How long will it take to complete the installation and /mnt_permanent/HA1/profile can be created? Below is the log:
module.netweaver_node.module.netweaver_provision.null_resource.provision[1] (remote-exec): [INFO ] Running state [/mnt_permanent/HA1/profile] at time 21:11:35.343378
module.netweaver_node.module.netweaver_provision.null_resource.provision[1] (remote-exec): [INFO ] Executing state file.exists for [/mnt_permanent/HA1/profile]
module.netweaver_node.module.netweaver_provision.null_resource.provision[1] (remote-exec): [ERROR ] Specified path /mnt_permanent/HA1/profile does not exist
module.netweaver_node.module.netweaver_provision.null_resource.provision[1] (remote-exec): [INFO ] Completed state [/mnt_permanent/HA1/profile] at time 21:11:35.347534 (duration_in_ms=4.156)
module.netweaver_node.module.netweaver_provision.null_resource.provision[1] (remote-exec): [INFO ] State result does not match retry until value, state will be re-run in 30 seconds
module.netweaver_node.module.netweaver_provision.null_resource.provision[1]: Still creating... [1h2m50s elapsed]
module.netweaver_node.module.netweaver_provision.null_resource.provision[1]: Still creating... [1h3m0s elapsed]
module.netweaver_node.module.netweaver_provision.null_resource.provision[1]: Still creating... [1h3m10s elapsed]
Regards - Budi
Hi @busetde We are progressing bit a bit hehe
This folder is created by the 1st Netweaver installed (ASCS node). If this node installation successes the rest will continue, but if this fails none of the others can continue. Could you check if the netweaver_provision.null_resource.provision[0]
installation worked?
Here a short documentation about how you can check if something went wrong in the NW or S4/HANA installation: https://github.com/SUSE/ha-sap-terraform-deployments/blob/master/doc/troubleshooting.md#netweaver-debugging
(this logs are available in the terraform output too)
Hi @arbulu89 ,
Yes in progress bit by bit. In regards to the route for drbd the floating ip address it's not automatically available for netweaver compute to connect, so need to deployed hana and drbd first, after able to ping the floating ip address, then enable netweaver and apply the terraform again. for the previous error likely solved by changing the netweaver_sapmnt_path = "/mnt_permanent/sapdata"
Right now I'm in the stage below: Is it usually taking long at this stage?
module.netweaver_node.module.netweaver_provision.null_resource.provision[1] (remote-exec): [INFO ] connecting to SAP HANA database at 10.0.1.12:30015
module.netweaver_node.module.netweaver_provision.null_resource.provision[1]: Still creating... [43m20s elapsed]
module.netweaver_node.module.netweaver_provision.null_resource.provision[1]: Still creating... [43m30s elapsed]
module.netweaver_node.module.netweaver_provision.null_resource.provision[1]: Still creating... [43m40s elapsed]
module.netweaver_node.module.netweaver_provision.null_resource.provision[1] (remote-exec): [INFO ] connecting to SAP HANA database at 10.0.1.12:30015
Thanks - Budi
Hi @busetde
I don't understand what's going on. This netweaver node shouldn't be trying to connect the HANA database. What do you have in netweaver_app_server_count
and netweaver_ha_enabled
finally?
It should connect as long as the HANA deployment is finished and you have a HANA cluster running with the virtual IP address on that specific address 10.0.1.12
Hi @arbulu89
I have below on my configuration
netweaver_app_server_count = 1
netweaver_ha_enabled = false
Will the above should works?
Regards - Budi
@busetde Please, don't paste logs without saying if you have done some change, it makes super difficult to follow up things. The above should work. I don't know why the NW machine doesn't find the HANA DB. As I commented, make sure that the HA cluster is working properly on the HANA machines and that it has the virtual ip address with the used address on the NW side.
Hi @arbulu89,
Am using the latest clone which is 7.1 - using this, made good progress that netweaver_provision.null_resource.provision[0]
and netweaver_provision.null_resource.provision[1]
completed successfully
but strangely below the messages from netweaver_provision.null_resource.provision[2]
and netweaver_provision.null_resource.provision[3]
Thanks - Budi
Hi @busetde , Once again, this is happening because this machines are not able to access the HANA machine. Check this:
crm cluster status
(most probably with root user). Check if everything looks fineip a
. Check if the 10.0.1.12
address is there (this might be in any of the cluster nodes, where the HANA primary is running)HDB info
10.0.1.12
addressYou have to assure that the HANA db is running and accessible to other machines.
Xabi
Hi @arbulu89
crm cluster status
(most probably with root user). Check if everything looks fineip a
. Check if the 10.0.1.12
address is there (this might be in any of the cluster nodes, where the HANA primary is running)Confirm in Google Cloud Console there's hana-route
Regards - Budi
Hi @busetde , This is becoming too complex to check just with the provided information. I cannot really judge what's going on. I would need to get the whole log set. Could you wait until the whole execution fails and send me the logs from the failed machines? I would need:
salt-
kind of files in /var/log/
I would need files from the 2 failed systems (netweaver2 and netweaver3).Maybe the DB installation part failed before those connection
messages... Have you observed some error long on netweaver2 with some installation failure?
Hi @busetde,
I had a conversation with @arbulu89 regarding the current status. I would suggest the following action plan:
crm_mon -rnf
and SAPHanaSR-showAttr
commands.crm_mon -rnf
command.terraform.tfvars
file and any modified Salt pillars. I would strongly recommend starting your debugging by using the default values as much as you can. supportconfig
command and collect the output files from each node.Best regards, Ab
Hi @arbulu89,
Please find attached the log requested for netweaver.provision[02] and netweaver.provision[03]. netweaver04.zip netweaver03.zip
Hi @ab-mohamed,
HANA and DRBD deployment is successful. But NW crm_mon got Failed Resource as below:
Will prepare your request and send privately via email to you.
Regards - Budi
Hi @busetde ,
Yes, as I guessed the NW db instance installation is failing. I think I have been seen this error before and it is fixed in our development repository;
Could you re-run the deployment changing this in your terraform.tfvars:
ha_sap_deployment_repo = "https://download.opensuse.org/repositories/network:ha-clustering:sap-deployments:devel"
With this you shouldn't have the next error that is displayed in salt-deployment.log
multiple times:
2021-04-21 14:44:05,796 [shaptools.shell :53 ][INFO ][4496] ERROR 2021-04-21 14:44:02.427 (root/sapinst) (startInstallation) [CSiStepExecute.cpp:1104] id=controller.stepExecuted errno=FCO-00011 CSiStepExecute::execute()
2021-04-21 14:44:05,797 [shaptools.shell :53 ][INFO ][4496] The step getDBInfo with step key |NW_ABAP_DB|ind|ind|ind|ind|0|0|NW_GetSidMaybeProfiles|ind|ind|ind|ind|getSid|0|NW_GetSidFromProfilesPartial|ind|ind|ind|ind|havepf|0|NW_getDBInfo|ind|ind|ind|ind|db|0|NW_HDB_getDBInfo|ind|ind|ind|ind|hdb_dbinfo|0|getDBInfo was executed with status ERROR (Last error reported by the step: Caught ESAPinstException in module call: Validator of step '|NW_ABAP_DB|ind|ind|ind|ind|0|0|NW_GetSidMaybeProfiles|ind|ind|ind|ind|getSid|0|NW_GetSidFromProfilesPartial|ind|ind|ind|ind|havepf|0|NW_getDBInfo|ind|ind|ind|ind|db|0|NW_HDB_getDBInfo|ind|ind|ind|ind|hdb_dbinfo|0|getDBInfo' reported an error:
2021-04-21 14:44:05,797 [shaptools.shell :53 ][INFO ][4496] Start SAPinst in interactive mode to solve this problem).
Have a look if the execution with this value removes this error in netweaver03
Hi @arbulu89,
Have tried with the ha_sap_deployment_repo = "https://download.opensuse.org/repositories/network:ha-clustering:sap-deployments:devel"
configured in terraform.tfvars
But still notice the same error in salt-deployment.log
Attached is the log: netweaver03_devel.zip
Regards - Budi
Hi @busetde ,
I found this on the sapinst_dev.log
file:
Product S4HANA 1809 not supported on SAP HANA version [2.00.030.00.1522209842 (fa/hana2sp03)].
This means that the HANA version you are using is not compatible with S4HANA 1809. You will need to get a compatible version and retry. There is not anything we can do here if the SAP versions are not compatible. The manual installation would fail in the same way.
It's a pity that SAP logs are not that clear and this is only printed in the dev
log, which is unfortunate.
Thank you @arbulu89 for your efforts.
@busetde: I have checked the SAP PAM. HANA 2.0 SPS03 is the min HANA version used with S/4HANA 1809. But you use SLES15 SP2, so I believe that you need to use HANA 2.0 SPS04 revision 48.01 and newer. Please review SAP Note 2235581.
Best regards, Ab
Hi @arbulu89 and @ab-mohamed,
Previous error resolves now with S/4HANA 1809 using HANA 2.0 SPS04.
Using: ha_sap_deployment_repo = "https://download.opensuse.org/repositories/network:ha-clustering:sap-deployments:devel"
There's another error with netweaver.provision[03]
/ b-netweaver04 as below:
Have attached the log for netweaver.provision[02]
/ b-netweaver03 and netweaver.provision[03]
/ b-netweaver04
Thanks - Budi netweaver04_hanasps04.zip netweaver03_hanasps04.zip
Hi @busetde ,
This error might be normal. The AAS tries to install in a repetitive fashion until some 10 retries (by default) fail. And it won't pass until the netweaver03 execution of DB and PAS finished. In the logs of netweaver03
I can see that it is still installing (this is the last piece of the logs at least).
So, you will need to wait until netweaver03
is finished. If this ones fails, the netweaver04
will fail most probably.
Please, wait until failure (until terraform completely fails) to see what happens (and upload the log files in this case)
Edit. By the way, the DB and PAS installation on netweaver03
might take a while (some few hours included, depending on the VM size)
Hi @arbulu89 ,
Thanks for your helps, have completed the succesful deployment. May I know what instance Public IP address and what port should I use to connect to the server via SAP GUI?
Thanks - Budi
@busetde, Excellent!
You should have the public IP addresses in the deployment output just after your last screenshot.
Best regards, Ab
Hi @ab-mohamed ,
Understand that I have a list of Public IP, but what instance should I use, am assuming it's netweaver01 as ASCS? Is it correct and which port should I connect to?
Thanks - Budi
Hi @arbulu89 and @ab-mohamed ,
Thanks for the support. Close this issues, as I'm able to deploy it succesfully and logon via SAP GUI.
Thanks - Budi
@busetde Glad to see a happy ending!
Used cloud platform GCP
Used SLES4SAP version SLES4SAP-15SP2
Used client machine OS Linux using GCP Cloud Shell
Expected behaviour vs observed behaviour
How to reproduce Specify the step by step process to reproduce the issue. This usually would look like something like this:
terraform.tfvars
file based onterraform.tfvars.example
only enabled HANA and NetweaverUsed terraform.tfvars Paste here the used
terraform.tfvars
file content. If the file has any secret, change them by dummy information.Logs Upload the deployment logs to make the root cause finding easier. The logs might have sensitive secrets exposed. Remove them before uploading anything here. Otherwise, contact @arbulu89 to send the logs privately.