Esri / arcgis-cookbook

Chef cookbooks for ArcGIS
Apache License 2.0
297 stars 116 forks source link

Upgrade patterns #230

Closed drobinson13 closed 4 years ago

drobinson13 commented 5 years ago

Hi,

I have ran into a number of issues with the Chef upgrade pattern to 10.7.1 from a 10.5.1 ArcGIS Enterprise.

I am using the following steps (as suggested to upgrade directly to 10.7.1): https://arcgisstore1071.s3.amazonaws.com/11595/docs/UpgradeChefAlloneWin.html

In step 6, I have applied the following five changes to the c:\chef\node.json as stated:

Why does the workflow suggest the following as the final step (step 7): chef-solo -j c:\chef\base_enterprise_allinone_windows.json

This command fires the following error - Cannot load configuration from c:\chef\base_enterprise_allinone_windows.json

I changed the command to point to node.json, as I performed the edits on that file: chef-solo -j c:\chef\node.json

However again I run into an error message -

================================================================================ Chef encountered an error attempting to load the node data for "mysite.com"

Unknown Server Error:

The server had a fatal error attempting to load the node data.

Running handlers: [2019-07-29T15:58:06+01:00] ERROR: Running exception handlers Running handlers complete [2019-07-29T15:58:06+01:00] ERROR: Exception handlers complete Chef Infra Client failed. 0 resources updated in 03 seconds [2019-07-29T15:58:06+01:00] FATAL: Stacktrace dumped to c:/chef/local-mode-cache/cache/chef-stacktrace.out [2019-07-29T15:58:06+01:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report [2019-07-29T15:58:06+01:00] FATAL: Net::HTTPFatalError: 500 "Internal Server Error"

Any idea why the workflow isn't working?

Thanks, Damien

cameronkroeker commented 5 years ago

@drobinson13

Are you able to provide the contents of the c:\chef\local-mode-cache\cache\chef-stacktrace.out file?

drobinson13 commented 5 years ago

@cameronkroeker - i've included the stacktrace.out and a chef output log file (in a zipped folder because github can't handle .out uploads)

chef-stacktrace.zip

cameronkroeker commented 5 years ago

Thanks @drobinson13. From the looks of it could be having issues loading the data from the nodes dir. Let's try clearing out C:\chef\nodes dir and C:\chef\ocal-mode-cache dir and run again.

drobinson13 commented 5 years ago

Thanks @cameronkroeker - clearing out the cache and nodes directories allowed to run a lot longer. It seems to have gotten to the point of configuring the web adaptor and failed - indicating it couldn't access the url.

[2019-08-02T14:47:35+01:00] WARN: *****************************************
[2019-08-02T14:47:35+01:00] WARN: Did not find config file: C:/chef/client.rb. Using command line options instead.
[2019-08-02T14:47:35+01:00] WARN: *****************************************
[2019-08-02T15:12:23+01:00] WARN: ArcGIS Server site already exists.
[2019-08-02T15:13:32+01:00] ERROR: Failed to configure Web Adaptor with ArcGIS Server. Expected process to exit with [0], but received '1'
STDOUT: ERROR: Unable to connect to WebAdaptorURL : https://domain.com/server/webadaptor
STDERR:  
cameronkroeker commented 5 years ago

@drobinson13

Glad to hear clearing out the cache and nodes dir allowed the chef run to get a bit further. As for the WA issue, I would check the following things:

Also, are there any IIS attributes being passed in your role json file? For example:

https://github.com/Esri/arcgis-cookbook/tree/master/cookbooks/esri-iis

drobinson13 commented 5 years ago

@cameronkroeker

I have done extensive testing against multiple ArcGIS Enterprise versions.

To upgrade to 10.7.1 - you need cookbooks version 3.4.0 and Chef 13 or 14.

If you don't use cookbooks version 3.4.0 you get the following error message: [2019-08-02T12:29:06+01:00] FATAL: Chef::Exceptions::CookbookChefVersionMismatch: Cookbook 'arcgis-enterprise' version '3.1.0' depends on chef version [">= 12.6", "< 13.0"], but the running chef version is 14.13.11

There is an issue in configuring web adaptors - I get the following message:

[2019-08-07T14:01:41+01:00] ERROR: Failed to configure Web Adaptor with Portal for ArcGIS. Expected process to exit with [0], but received '1'
---- Begin output of "C:\Program Files (x86)\Common Files\ArcGIS\WebAdaptor\IIS\10.7.1\Tools\ConfigureWebAdaptor.exe" /m portal /w "https://mydns/portal/webadaptor" /g "https://internalIP:7443" /u "myadminuser" /p "mypassword" ----
STDOUT: ERROR: Unable to connect to WebAdaptorURL : https://mydns/portal/webadaptor
[2019-08-07T14:11:43+01:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
[2019-08-07T14:11:43+01:00] FATAL: Mixlib::ShellOut::ShellCommandFailed: arcgis_enterprise_webadaptor[Configure Web Adaptor with Portal] (arcgis-enterprise::portal_wa line 22) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'

I can access the web adaptor landing page via the browser on the instance, and can configure it manually.

I can then re-run the upgrade (it skips over the upgraded Portal and its new web adaptor), however it stalls again on configuring the web adaptor for ArcGIS Server.

[2019-08-07T17:06:26+01:00] ERROR: Failed to configure Web Adaptor with ArcGIS Server. Expected process to exit with [0], but received '1'
---- Begin output of "C:\Program Files (x86)\Common Files\ArcGIS\WebAdaptor\IIS\10.7.1\Tools\ConfigureWebAdaptor.exe" /m server /w "https://mydns/server/webadaptor" /g "https://internalIP:6443" /u "myadminuser" /p "mypassword" /a true ----
STDOUT: ERROR: Unable to connect to WebAdaptorURL : https://mydns/server/webadaptor
[2019-08-07T17:06:26+01:00] FATAL: Mixlib::ShellOut::ShellCommandFailed: arcgis_enterprise_webadaptor[Configure Web Adaptor with Server] (arcgis-enterprise::server_wa line 22) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'

Again I can access the web adaptor landing page via the browser on the instance, and can configure it manually.

I can then re-run the upgrade (it skips over the upgraded Portal and its new web adaptor, the upgraded server and its new web adaptor) - the upgrade is subsequently successful.

I have performed testing on upgrading 10.5.x, 10.6.x and 10.7 to 10.7.1 and the web adaptor failing to configure is consistent every time. I have tried altering the timeout on the Portal web adaptor configuration (when configuring manually it can take over 5 mins to report success), however it still failed. The server web adaptor registers instantly when manually configuring - but fails in Chef - so i'm confident the issue isn't timeout related and is consistent across all web adaptor configurations.

Thanks, Damien

drobinson13 commented 4 years ago

Hi @cameronkroeker , have you been able to replicate the issues with the web adaptor configuration that i've experienced?

Nickolaitc commented 4 years ago

Hey @drobinson13,

Could you perhaps provide us with your nodes.json templates? It will help us check values and see if, perhaps, we can repro some behavior you are witnessing.

cameronkroeker commented 4 years ago

Hi @drobinson13,

I have no been able to reproduce this issue. In fact this morning deployed ArcGIS Enterprise 10.5 using the Cloud Formation Template, then used chef to upgrade it to 10.7.1 and it was successful.

Are you able to run the following from command line manually?

"C:\Program Files (x86)\Common Files\ArcGIS\WebAdaptor\IIS\10.7.1\Tools\ConfigureWebAdaptor.exe" /m portal /w "https://mydns/portal/webadaptor" /g "https://internalIP:7443" /u "myadminuser" /p "mypassword"

Is the portal content and AGS config-store in cloud storage (S3/DynmoDB) or File System (local or shared)?

Also, as @Nickolaitc has mentioned could you provide both the node.json and upgrade node.json files?

Thanks, Cameron Kroeker

drobinson13 commented 4 years ago

Hi @cameronkroeker , @Nickolaitc

Here is my node.json file (as .txt to allow me to add it as an attachment) - I edited the existing node.json as per step 6 here: https://arcgisstore1071.s3.amazonaws.com/11595/docs/UpgradeChefAlloneWin.html - hence my node.json and upgrade node.json are one and the same.

node.txt

Here is my output log from the upgrade process: log.txt

My environment is a simple 10.5.1 cloudformation deployment with local config-store.

When the command line web adaptor configuration fails as part of the chef process, I am able to access the site through the internal IP in a browser and configure it manually (to confirm there are no access issues, which is what the log errors seem to suggest) - hence I haven't tried re-running the command line yet, I suspect it would work though.

Thanks, Damien

cameronkroeker commented 4 years ago

Thanks @drobinson13.

The logs are suggesting that the DNS WA Url can't be accessed:

STDOUT: ERROR: Unable to connect to WebAdaptorURL : https://MYDNS/server/webadaptor

In the post above you mentioned that you can access the WA URL via internal IP in the browser and configure successfully. However, Chef is attempting to use https://MYDNS/server/webadaptor to configure but can't be reached.

I recommend checking the C:\Windows\system32\drivers\etc\hosts file on the WA machine to ensure it has a mapping of the internal ip to MYDNS.

Then try the following two tests:

  1. Launch web browser and browse to https://MYDNS/server/webadaptor to ensure this resolves correctly.
  2. Run the following command from cmd:

C:\Program Files (x86)\Common Files\ArcGIS\WebAdaptor\IIS\10.7.1\Tools\ConfigureWebAdaptor.exe" /m server /w "https://MYDNS/server/webadaptor" /g "https://MYINTERNALIP:6443" /u "MYADMINUSER" /p "MYMYADMINUSERPASSWORD" /a true

drobinson13 commented 4 years ago

Thanks @cameronkroeker I can confirm the hosts file is as expected. With regards the MYDNS web adaptor URL - i can access it without issue in a browser, however it fails on the command line.

Browser: browser

Command Line: commandline

cameronkroeker commented 4 years ago

Thanks @cameronkroeker I can confirm the hosts file is as expected. With regards the MYDNS web adaptor URL - i can access it without issue in a browser, however it fails on the command line.

Browser: browser

Command Line: commandline

@drobinson13

Interesting results. This explains why the Chef Run is failing to configure the Server/Portal WA. Chef uses the WA command line tool and this tool is failing outside of Chef. So the issue here isn't with Chef, but with the WA command line tool.

As a hunch I suspected the issue was related to TLS and I was able to successfully reproduce this issue by disabling Client TLS1.0 in Windows registry. As soon as I added the following I am getting the same exact error as you:

TLS1_disabled

So with TLS1.0 disabled it appears the configuration is successful via web browser but not command line tool. As soon as I re-enable TLS1.0 command line tool works again.

I recommend opening regedit on the WA machine check HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.0\Client to see if TLS1.0 is disabled. If it is, try enabling it and running the silent configure command again.

cameronkroeker commented 4 years ago

@drobinson13

There is a better workaround for this issue listed here: https://support.esri.com/en/bugs/nimbus/QlVHLTAwMDEyMjIzMw==

drobinson13 commented 4 years ago

Thanks @cameronkroeker and @Nickolaitc for all your assistance with this issue! I disable TLS1.0 as a post deployment step - i'll run the script listed in the bug workaround to ensure the new default is TLS1.2. from now on!