Esri / arcgis-cookbook

Chef cookbooks for ArcGIS
Apache License 2.0
300 stars 116 forks source link

Chef Error: No nodes returned from search #305

Closed Mike-DM closed 2 years ago

Mike-DM commented 2 years ago

Hello, I'm currently attempting to deploy ArcGIS Enterprise 10.9.1 using Chef. I have the Chef Server (RHEL7) and Chef Workstation (Server 2019) configured. Everything was going somewhat smoothly until today when I attempted to start the install of Portal on one of my VMs and I received a "FATAL: No nodes returned from search" error.

I didn't have any issues bootstrapping the VM and when I run 'knife node list' and 'knife client list' the node shows up. However, in Chef Manage it only shows under Administration > Group > Clients not under the main section for Nodes or Clients. Also, I've uploaded some roles using the knife upload roles command and they also do not appear in Chef manage but the command executed correctly. I uploaded the cookbooks with the knife upload command, but they do appear in Chef Manage.

I've tried running bootstrap on the VM again, but that doesn't change the error. At one point the disk on my Chef Server was full (I've since cleared it) and I'm wondering if that caused something to get corrupt. I've run chef-server-ctl reconfigure as well as chef-server-ctl reindex -a with no luck on resolving the error.

The client.rb file on the VM has the correct information and I can ping from the chef server to the chef workstation and the client.

Any ideas on what I could be doing wrong or what steps I can take to help troubleshoot and fix this issue?

The software versions I'm using: Chef-Workstation 11.679 Chef Manage 3.2.13 Chef Server Core 14.11.15

Thanks, Mike

cameronkroeker commented 2 years ago

Hi @Mike-DM,

Try updating the node's run-list to be empty, then run chef-client to get some basic attributes onto the Chef server. This might get the search to work. See: https://stackoverflow.com/a/18580544

If this doesn't work I am not entirely sure what is causing the search to not work. Might need to fully remove all traces of the node and re-bootstrap it.

Thanks, Cameron K.

Mike-DM commented 2 years ago

Hi Cameron,

I tried updating the none's run-list to empty and running chef-client, but it didn't resolve the error. It's weird because I can use knife search and it finds the node and even lists information about it, but when I try to run the command:

knife winrm 'role:arcgis-portal-primary' 'chef-client' --winrm-shell elevated -x 'username' -P 'password'

It says No nodes returned from search even though there is a node with that role in the run-list.

I'm now uninstalling Chef Client and Chef Server and I'm going to install earlier versions I had success with in a dev environment. I want to see if perhaps it's a bug with the software I'm using or if something in my environment was causing the issue.

Mike-DM commented 2 years ago

I updated the node's run-list to empty and ran chef-client. That does work, chef ran on the node without error. However, back on the Chef Workstation, it still gives a "No nodes returned from search" error when attempting to execute chef on the node. I have been troubleshooting this for the past few days, but still haven't narrowed down the cause. My assumption is there's some kind of issue resolving the hostname of the node, but in everything I have checked the hostname is resolving just fine. Also, knife doesn't have any issues finding the node with knife list node or knife list client. I can even search directly for the node with knife search node name:node and it displays the node and its attributes. Chef Manage is also displaying the node and all the correct attributes for it. It's only the knife winrm command to execute the run_list that is assigned to the node that fails with a "No Nodes returned" error.

cameronkroeker commented 2 years ago

Hi @Mike-DM,

Could you share the bootstrap command used, along with the command used to set the run-list? Perhaps the value passed in the --node-name during the bootstrap does not match the value that was passed in when setting the run-list.

Bootstrap machine HOSTNAME with node name set to portal-primary:

knife bootstrap -o winrm HOSTNAME -U 'username' -P 'password' --node-name portal-primary --bootstrap-version '15.14.0' --secret-file "C:/chef-repo/.chef/your-chef-server-encryption-key-file.pem" --chef-license accept

Upload role file:

knife upload roles\arcgis-portal-primary-role.json

Set portal-primary node to arcgis-portal-primary-role

knife node run_list set portal-primary 'role[arcgis-portal-primary-role]'

Run chef-client on all nodes that are set to role:arcgis-portal-primary-role:

knife winrm 'role:arcgis-portal-primary-role' 'chef-client' --winrm-shell elevated -x 'username' -P 'password'

Thanks, Cameron K.

Mike-DM commented 2 years ago

Hi Cameron,

Those are the exact commands I'm using with the only difference that I named the node "portal-active" instead of "portal-primary". In the bootstrap command, where it gives the path to the encryption-key.pem, I have it going to the admin.pem file that was included in the Starter kit. I just want to be sure that's the correct key file or if I should be using the key file that was generated when I created my organization. I named that one validator.pem, but based on my readings it's no longer used to authenticate clients.

cameronkroeker commented 2 years ago

Hi Cameron,

Those are the exact commands I'm using with the only difference that I named the node "portal-active" instead of "portal-primary". In the bootstrap command, where it gives the path to the encryption-key.pem, I have it going to the admin.pem file that was included in the Starter kit. I just want to be sure that's the correct key file or if I should be using the key file that was generated when I created my organization. I named that one validator.pem, but based on my readings it's no longer used to authenticate clients.

Thanks for confirming the commands, and yes, I believe you are using the correct pem file during bootstrap.

Mike-DM commented 2 years ago

I'm going to close this issue. I haven't been able to figure why I'm getting the error (could be FIPS related), but I'm able to work around it by running "chef-client" on the nodes after I've set a run_list.