Esri / arcgis-cookbook

Chef cookbooks for ArcGIS
Apache License 2.0
290 stars 115 forks source link

Health check fails on primary portal logs on arcgis enterprise 10.9.1 with HA #309

Closed makariw closed 2 years ago

makariw commented 2 years ago

Hi,

I have used the latest recipes (arcgis-3.8.0-cookbooks) to install ArcGIS enterprise with HA (active - passive).

I have a load balancer in front of 2 VMs running Windows 2016

I have set the private and web context urls for server as shown below

"private_url": "https://lb.domain.com/server", "web_context_url": "https://lb.domain.com/server",

Similarly for portal

"private_url": "https://lb.domain.com/portal", "web_context_url": "https://lb.domain.com/portal",

My primary machine keeps reporting the following in the Portal logs

Unable to reach the ArcGIS Portal Directory https://lb.domain.com:7443/arcgis/sharing/rest. Restart the Portal for ArcGIS service and try again. If the problem persists, contact Esri technical support (U.S.) or your distributor (customers outside the U.S.). Health Check failed, the portal is not ready.

I can login to Portal home page and everything else seems fine.

I can manually reach the following url

https://lb.domain.com/portal/sharing/rest (through portal web adaptor)

but https://lb.domain.com:7443/arcgis/sharing/rest does not work.

Any idea how I can ensure that my Portal can run this health check successfully? Is there something wrong with my config?

Regards,

Makari

cameronkroeker commented 2 years ago

Hi Makari,

Is the load balancer pointing to both portal web adaptor's or portal's internal 7443 url? For example:

https://lb.domain.com/portal -> https://portalWebAdaptor1.domain.com/portal and https://portalWebAdaptor2.domain.com/portal

or

https://lb.domain.com/portal -> https://portal1.domain.com:7443/arcgis and https://portal2.domain.com:7443/arcgis

For now I will assume the load balancer is routing requests to the portal web adaptors. In this scenario you will want to set the following attributes:

arcgis-enterprise-primary.json

"system_properties": {
   "privatePortalURL": "https://lb.domain.com/portal",
   "WebContextURL": "https://lb.domain.com/portal"
}

If you have a load balancer that is routing requests to the web adaptors, and internal 7443 then it would look something like this:

https://lb.domain.com/portal -> https://portalWebAdaptor1.domain.com/portal and https://portalWebAdaptor2.domain.com/portal

and

https://lb.domain.com:7443/arcgis -> https://portal1.domain.com:7443/arcgis and https://portal2.domain.com:7443/arcgis

"system_properties": {
   "privatePortalURL": "https://lb.domain.com:7443/arcgis",
   "WebContextURL": "https://lb.domain.com/portal"
}

Thanks, Cameron K.

makariw commented 2 years ago

Hi Cameron,

Thanks for responding so quickly. We do not have ports 7443 and 6443 open on the load balancer so https://lb.domain.com/portal is point at

https://portal1.domain.com/portal https://portal2.domain.com/portal

I am not sure what is directing traffic to https://lb.domain.com:7443

Is this likely to be due to rules on the load balancer?

Regards,

Makari.

cameronkroeker commented 2 years ago

Hi Cameron,

Thanks for responding so quickly. We do not have ports 7443 and 6443 open on the load balancer so https://lb.domain.com/portal is point at

https://portal1.domain.com/portal https://portal2.domain.com/portal

I am not sure what is directing traffic to https://lb.domain.com:7443

Is this likely to be due to rules on the load balancer?

Regards,

Makari.

What values are defined for arcgis.portal.system_properties.privatePortalURL and arcgis.portal.system_properties.WebContextURL attributes?

Also, let's also check to see what is set as the adminURL from the following endpoint (requires a token, so first sign in or pass in a token in the request:

https://portal1.domain.com:7443/arcgis/portaladmin/machines/machines?f=pjson

{"machines": [
    {
        "machineName": "standbyportal.domain.com",
        "adminURL": "https://standbyportal.domain.com:7443/arcgis",
        "role": "standby",
        "platform": ""
    },
    {
        "machineName": "primaryportal.domain.com",
        "adminURL": "https://primaryportal.domain.com:7443/arcgis",
        "role": "primary",
        "platform": ""
    }
]}

If there was an /etc/host entry pointing the private ip of portal machine to the lb.domain.com this could cause the adminURL to pick lb.domain.com as its internal hostname which perhaps is the culprit.

Thanks, Cameron K.

makariw commented 2 years ago

Hi Cameron,

What values are defined for arcgis.portal.system_properties.privatePortalURL and arcgis.portal.system_properties.WebContextURL attributes?

"system_properties": { "privatePortalURL": "https://lb.domain.com/portal", "WebContextURL": "https://lb.domain.com/portal" }

I have tried to check the adminURL from https://portal1.domain.com:7443/arcgis/portaladmin/machines/machines?f=pjson

but it does not resolve, it redirects me to https://lb.domain.com:7443/arcgis/portaladmin/

Your last point here makes sense as I did exactly that to get the federation stage on the primary portal machine to complete successfully. I added the IP of the primary machine and lb.domain.com in the /etc/hosts file.

I then removed the entry after the installation was complete

Is there any way I can correct this after installation? Do I need to reinstall?

Regards,

Makari.

cameronkroeker commented 2 years ago

@makariw,

Ah okay that makes sense. Please check the following files:

C:\Program Files\ArcGIS\Portal\framework\etc\hostname.properties

and

C:\Program Files\ArcGIS\Portal\framework\runtime\ds\framework\etc\hostidentifier.properties

They likely are set to lb.domain.com which is the culprit, as this should be set to the local hostname or ip. You can try stopping portal service in services.msc, updating those two files, then start portal to see if it helps. Though since the portal site has already been created we will likely need to uninstall and reinstall in order to change these. But if the portal was only installed and site hadn't been created yet we could manually modify those files, restart the service and create the site without having to reinstall.

I suggest using the following attributes in your json to avoid portal picking the wrong value from /etc/hosts:

makariw commented 2 years ago

Hi Cameron,

I do not have C:\Program Files\ArcGIS\Portal\framework\etc\hostname.properties

C:\Program Files\ArcGIS\Portal\framework\runtime\ds\framework\etc\hostidentifier.properties has the entry below preferredidentifier=hostname

makariw commented 2 years ago

Hi Cameron,

I have decide to reinstall.

So, I have added the following

node['arcgis']['portal']['hostname'] = portal1 hostname node['arcgis']['portal']['hostidentifier'] = portal1 hostname

However, I get the following

arcgis_enterprise_portal[Federate Server] action federate_server[2022-07-01T10:23:41+00:00] WARN: Util.wait_until_url_available timed out for https://lb.domain.com/server/admin/?f=json after 1004.91 seconds.

Any suggestions here? both the privatePortalURL and webContextURL are both set to https://lb.domain.com/portal by the way.

Regards,

makari.

cameronkroeker commented 2 years ago

Hi @makariw,

Looks like the error is complaining about the lb for server; https://lb.domain.com/server/admin is not accessible or available. We need to ensure the following is reachable from the portal node, and server node:

$ curl https://lb.domain.com/server/admin 

Some things to check:

Thanks, Cameron K.

makariw commented 2 years ago

Running the install again with exactly the same parameters worked the second time. Not sure why but the federation stage completed successfully on the second time I run the install.

Thanks Cameron for all your help this this issue.

Regards,

Makari