Esri / arcgis-cookbook

Chef cookbooks for ArcGIS
Apache License 2.0
290 stars 115 forks source link

Join site is hanging and rerunning script results in success #292

Closed bturrell closed 3 years ago

bturrell commented 3 years ago

I'm running a highly available ArcGIS Server stack using the recommended Esri cloudformation with some additional changes to make it fit in to our infrastructure. As part of our security requirements I'm baking our own AMI with only the minimal software required namely chef and ArcGIS Server.

When one of the nodes spin up it runs the node.json script and hangs at the join_site. I can see in the manager that it has joined the site. If I cancel the script and rerun it then it will complete as expected and come up as healthy. I have tried a variety of combinations of chef, cinc and cookbooks. Below are my results from each of the pairings.

Esri cookbooks 3.7.0 ArcGIS Server 10.8.1 Chef 17.2.29 Windows Server 2019 The node script runs fine until it joins site. It then joins the site but the chef script freezes. Closing and relaunching the script will result in a success.

Esri cookbooks 3.7.0 ArcGIS Server 10.8.1 CINC 17.2.29 Windows Server 2019 Script doesn't run. I get the below error. The openssl.so file does exist and the script is run with admin rights so it can see the file. PS C:\Windows\system32> cinc-solo [2021-06-29T05:04:29+00:00] WARN: No config file found or specified on command line. Using command line options instead.[2021-06-29T05:04:30+00:00] WARN: [2021-06-29T05:04:30+00:00] WARN: Did not find config file: C:/cinc/client.rb. Using command line options instead. [2021-06-29T05:04:30+00:00] WARN: [2021-06-29T05:04:30+00:00] FATAL: LoadError: 126: The specified module could not be found. - C:/cinc-project/cinc/embedded/lib/ruby/3.0.0/x64-mingw32/openssl.so

Esri cookbooks 3.7.0 ArcGIS Server 10.8.1 Chef 15.14.0 / CINC 15.14.0 Windows Server 2019 When running using this version of chef or cinc I get the following error when chef trys to start the ArcGIS Server.

[2021-06-29T04:54:23+00:00] INFO: Processing arcgis_enterprise_server[Start ArcGIS Server after upgrade] action start (arcgis-enterprise::server_node line 22) [2021-06-29T04:54:23+00:00] INFO: Processing windows_env[arcgis_cloud_platform] action create (c:/chef/local-mode-cache/cache/cookbooks/arcgis-enterprise/providers/server.rb line 589) [2021-06-29T04:54:24+00:00] INFO: Running queued delayed notifications before re-raising exception [2021-06-29T04:54:24+00:00] INFO: Running queued delayed notifications before re-raising exception [2021-06-29T04:54:24+00:00] ERROR: Running exception handlers [2021-06-29T04:54:24+00:00] ERROR: Exception handlers complete [2021-06-29T04:54:24+00:00] FATAL: Stacktrace dumped to c:/chef/local-mode-cache/cache/chef-stacktrace.out [2021-06-29T04:54:24+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report [2021-06-29T04:54:24+00:00] DEBUG: RangeError: arcgis_enterprise_server[Start ArcGIS Server after upgrade] (arcgis-enterprise::server_node line 22) had an error: RangeError: windows_env[arcgis_cloud_platform] (c:/chef/local-mode-cache/cache/cookbooks/arcgis-enterprise/providers/server.rb line 589) had an error: RangeError: bignum too big to convert into `long'

If I run the script again it moves to the joining site and hangs there. Cancelling the script and running it again (same as version 17.2.29) will result in success.

Any advice would be appreciated!

Thanks

Ben

cameronkroeker commented 3 years ago

Hi @bturrell,

Can you confirm if node['arcgis']['server']['use_join_site_tool'] is set to true or false in your node.json?

Also could you provide the stacktrace file from [2021-06-29T04:54:24+00:00] FATAL: Stacktrace dumped to c:/chef/local-mode-cache/cache/chef-stacktrace.out

Thanks, Cameron K.

bturrell commented 3 years ago

Hey Cameron,

node['arcgis']['server']['use_join_site_tool'] is set to true in our node.json.

In our most successful builds which we are currently trying to troubleshoot which is running chef 17.2.29. We don't have a stacktrace dump at the moment as the script technically doesn't crash out. It just hangs on the join site.

Thanks,

Ben

bturrell commented 3 years ago

Hey Cameron,

We are now getting an error message on the join_site. Stack trace is below:

Generated at 2021-07-01 02:59:00 +0000 Errno::ENOENT: arcgis_enterprise_server[Join ArcGIS Server Site] (arcgis-enterprise::server_node line 77) had an error: Errno::ENOENT: No such file or directory - GetProfileType C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/mixlib-shellout-3.2.5-universal-mingw32/lib/mixlib/shellout/windows/core_ext.rb:396:in get_profile_type' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/mixlib-shellout-3.2.5-universal-mingw32/lib/mixlib/shellout/windows/core_ext.rb:390:inlogon_has_roaming_profile?' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/mixlib-shellout-3.2.5-universal-mingw32/lib/mixlib/shellout/windows/core_ext.rb:338:in create3' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/mixlib-shellout-3.2.5-universal-mingw32/lib/mixlib/shellout/windows.rb:91:inrun_command' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/mixlib-shellout-3.2.5-universal-mingw32/lib/mixlib/shellout.rb:270:in run_command' c:/chef/local-mode-cache/cache/cookbooks/arcgis-enterprise/providers/server.rb:358:inblock in class_from_file' (eval):2:in block in action_join_site' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/provider.rb:276:ininstance_eval' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/provider.rb:276:in compile_and_converge_action' (eval):2:inaction_join_site' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/provider.rb:217:in run_action' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/resource.rb:600:inblock in run_action' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/resource.rb:627:in with_umask' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/resource.rb:599:inrun_action' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/runner.rb:74:in run_action' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/runner.rb:108:inblock in run_all_actions' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/runner.rb:108:in each' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/runner.rb:108:inrun_all_actions' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/runner.rb:132:in block in converge' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/resource_collection/resource_list.rb:96:inblock in execute_each_resource' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/resource_collection/stepable_iterator.rb:114:in call_iterator_block' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/resource_collection/stepable_iterator.rb:85:instep' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/resource_collection/stepable_iterator.rb:103:in iterate' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/resource_collection/stepable_iterator.rb:54:ineach_with_index' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/resource_collection/resource_list.rb:94:in execute_each_resource' C:/opscode/chef/embedded/lib/ruby/3.0.0/forwardable.rb:238:inexecute_each_resource' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/runner.rb:130:in converge' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/client.rb:687:inblock in converge' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/client.rb:682:in catch' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/client.rb:682:inconverge' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/client.rb:706:in converge_and_save' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/client.rb:286:inrun' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/application.rb:305:in run_with_graceful_exit_option' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/application.rb:281:inblock in run_chef_client' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/local_mode.rb:42:in with_server_connectivity' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/application.rb:264:inrun_chef_client' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/application/base.rb:352:in run_application' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/application.rb:67:inrun' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-17.2.29-universal-mingw32/lib/chef/application/solo.rb:60:in run' C:/opscode/chef/embedded/lib/ruby/gems/3.0.0/gems/chef-bin-17.2.29/bin/chef-solo:24:in<top (required)>' C:/opscode/chef/bin/chef-solo:155:in load' C:/opscode/chef/bin/chef-solo:155:in

'

cameronkroeker commented 3 years ago

Hi @bturrell thanks for providing the stack trace.

Looks the chef mixlib-shellout is failing to retrieve the user profile type (396):

https://github.com/chef/mixlib-shellout/blob/32a53fe16be55d8dc2ea483322aa215ed9531494/lib/mixlib/shellout/windows/core_ext.rb#L388-L397

Ruby is likely calling the Windows GetProfileType function, which is likely returning false:

https://docs.microsoft.com/en-us/windows/win32/api/userenv/nf-userenv-getprofiletype#return-value

_If the user profile is not already loaded, the function fails. Note that the caller must have KEY_READ access to HKEY_LOCALMACHINE. This access right is granted by default. For more information, see Registry Key Security and Access Rights.

Thanks, Cameron K.

bturrell commented 3 years ago

All resolved. We had a group policy setting that didn't include the administrators group which was causing our failure: Computer Configuration\Windows Settings\Security Settings\Local Policies\User Rights Assignment\Replace a process level token Adding the administrator group to that allowed the join site to work.

Thanks for your help @cameronkroeker