Esri / arcgis-cookbook

Chef cookbooks for ArcGIS
Apache License 2.0
300 stars 116 forks source link

Unable to join Primary Server to File Server (RHEL 8 - ArcGIS 10.9.1) #328

Open DGrady83 opened 2 years ago

DGrady83 commented 2 years ago

We have a new File Server successfully created using arcgis-enterprise-fileserver.json

We are now trying to get the Primary up and running but hitting a snag when attempting to connect Primary to FS (file server)

What is the proper syntax we should be using for arcgis.server.directories_root when running on Linux? Is it simply the FS's IP or alias? Or are there certain "/" characters we need to put it, i.e. "//servername/gisdata/arcgisportal"

Using cookbooks 4.0 by the way

cameronkroeker commented 2 years ago

We have a new File Server successfully created using arcgis-enterprise-fileserver.json

We are now trying to get the Primary up and running but hitting a snag when attempting to connect Primary to FS (file server)

What is the proper syntax we should be using for arcgis.server.directories_root when running on Linux? Is it simply the FS's IP or alias? Or are there certain "/" characters we need to put it, i.e. "//servername/gisdata/arcgisportal"

Using cookbooks 4.0 by the way

Here is an example for portal:

arcgis.portal.content_store_connection_string - Replace FILESERVER with the file server machine hostname or static IP address

https://github.com/Esri/arcgis-cookbook/blob/237b0e39fca0a3b5988b997430bf7beb3637e039/templates/arcgis-enterprise-base/11.0/linux/arcgis-enterprise-primary.json#L71

And here is an example for server:

arcgis.server.directories_root - Replace FILESERVER with the file server machine hostname or static IP address

https://github.com/Esri/arcgis-cookbook/blob/237b0e39fca0a3b5988b997430bf7beb3637e039/templates/arcgis-enterprise-base/11.0/linux/arcgis-enterprise-primary.json#L31

arcgis.server.config_store_connection_string - Replace FILESERVER with the file server machine hostname or static IP address

https://github.com/Esri/arcgis-cookbook/blob/237b0e39fca0a3b5988b997430bf7beb3637e039/templates/arcgis-enterprise-base/11.0/linux/arcgis-enterprise-primary.json#L35

Thanks, Cameron K.

DGrady83 commented 2 years ago

Thanks Cameron. I updated the primary json file with the below line but still hitting connection error when creating the site. I did verify that the directory /gisdata/arcgisportal/content is on the FS. How would we confirm if its an access issue?

"content_store_connection_string": "/net/[FS_machine_IP]/gisdata/arcgisportal/content",

cameronkroeker commented 2 years ago

Thanks Cameron. I updated the primary json file with the below line but still hitting connection error when creating the site. I did verify that the directory /gisdata/arcgisportal/content is on the FS. How would we confirm if its an access issue?

"content_store_connection_string": "/net/[FS_machine_IP]/gisdata/arcgisportal/content",

  • arcgis_enterprise_portal[Create Portal Site] action create_site

    ================================================================================

    Error executing action create_site on resource 'arcgis_enterprise_portal[Create Portal Site]'

    RuntimeError

    Cannot read from directory path '/net/[FS_machine_IP]/gisdata/arcgisportal'. Please check that the location is valid and that the Portal service account has permissions to the location.

Does the arcgis.run_as_user have the same UID on both the file server and portal instance? If they differ, this could cause an issue with access the share.

Maybe try using touch command as the arcgis.run_as_user from the portal machine to see if it can create a test file. For example:

$ touch /net/[FS_machine_IP]/gisdata/arcgisportal/content/testFile.txt

Note: If this works delete the testFile.txt.

Some other notable things to try:

Should see something like this:

[arcgis@[FS_machine_IP] /]$ cat /etc/exports
/gisdata/arcgisserver *(rw,sync,insecure,no_subtree_check,nohide)
/gisdata/arcgisbackup *(rw,sync,insecure,no_subtree_check,nohide)
/gisdata/arcgisportal *(rw,sync,insecure,no_subtree_check,nohide)

Thanks, Cameron K.

DGrady83 commented 2 years ago

Hi @cameronkroeker

1) Both machines have the run_as_user set to arcgis so those are both the same

2) The FS does not have a /net directory so is it an issue when trying to run the "content_store_connection_string"? This is what is in our fileserver.json (ran without modifying anything in the template - all directories did get created) { "arcgis": { "version": "10.9.1", "run_as_user": "arcgis", "fileserver": { "directories": [ "/gisdata/arcgisserver", "/gisdata/arcgisbackup", "/gisdata/arcgisbackup/tilecache", "/gisdata/arcgisbackup/relational", "/gisdata/arcgisportal", "/gisdata/arcgisportal/content" ], "shares": [ "/gisdata/arcgisserver", "/gisdata/arcgisbackup", "/gisdata/arcgisportal" ] } }, "run_list": [ "recipe[nfs::server]", "recipe[arcgis-enterprise::system]", "recipe[arcgis-enterprise::fileserver]" ] }

3) On the Primary server, the /etc/exports file is blank

cameronkroeker commented 2 years ago

Hi @DGrady83,

Let's ensure the /net directory is enabled on the file server node. To do this, try the following on the file server:

  1. Within the /etc/auto.master or /etc/autofs/auto.master file uncomment out the /net -hosts line. For example:
$ cat /etc/auto.master

#
# Sample auto.master file
# This is a 'master' automounter map and it has the following format:
# mount-point [map-type[,format]:]map [options]
# For details of the format look at auto.master(5).
#
/misc   /etc/auto.misc
#
# NOTE: mounts done from a hosts map will be mounted with the
#   "nosuid" and "nodev" options unless the "suid" and "dev"
#   options are explicitly given.
#
/net    -hosts 
#
# Include /etc/auto.master.d/*.autofs
# The included files must conform to the format of this file.
#
+dir:/etc/auto.master.d
#
# Include central master map if it can be found using
# nsswitch sources.
#
# Note that if there are entries for /net or /misc (as
# above) in the included master map any keys that are the
# same will not be seen as the first read key seen takes
# precedence.
#
+auto.master
  1. Then restart autofs:
$ systemctl restart autofs
  1. Then test to ensure the arcgis account can read\write to the shared directory from the primary node:
arcgis@primaryServer:~$ touch /net/[FS_machine_IP]/gisdata/arcgisportal/content/testFile.txt

Note: It's expected that the /etc/exports file is blank on the primary server, as it only gets populated on the node in which fileserver.json is executed on.

Thanks, Cameron K.

DGrady83 commented 2 years ago

Hi @cameronkroeker - so I think that's the issue. If I'm on the file server, I don't even see /etc/auto.master or /etc/autofs/auto.master

Also, when I try running the restart autofs command as arcgis user, I see this

systemctl restart autofs
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ====
Authentication is required to restart 'autofs.service'.
Authenticating as: Ansible service account (srvansible)
Password:
cameronkroeker commented 2 years ago

Hi @cameronkroeker - so I think that's the issue. If I'm on the file server, I don't even see /etc/auto.master or /etc/autofs/auto.master

Also, when I try running the restart autofs command as arcgis user, I see this

systemctl restart autofs
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ====
Authentication is required to restart 'autofs.service'.
Authenticating as: Ansible service account (srvansible)
Password:

My apologies, I should've noted that by default the arcgis account is not a user with sudo privileges, and the systemctl command will need to be ran as root or with a user that has sudo privileges.

$ sudo systemctl restart autofs

Let's also ensure autofs is installed (perhaps this is why /etc/auto.master or /etc/autofs/auto.master doesn't exist). If autofs isn't installed, try running the following command with root or with a user that has sudo privileges:

$ sudo yum install autofs

Thanks, Cameron K.

DGrady83 commented 2 years ago

Ok, I see autofs wasn't installed on the machine. Is autofs supposed to be installed as part of the fileserver.json script? Or is it something separate that we should include in our own startup script?

cameronkroeker commented 2 years ago

Ok, I see autofs wasn't installed on the machine. Is autofs supposed to be installed as part of the fileserver.json script? Or is it something separate that we should include in our own startup script?

The fileserver.json doesn't install/configure autofs, so it is something that needs too be done separately. Sorry for the inconvenience, this was a bit of an oversight. We will look to improve the cookbooks and\or documentation to help make it more clear/seamless.

Thanks, Cameron K.

DGrady83 commented 2 years ago

No worries. For now I've just added the line to my user data section

"yum -y install autofs\n",
"tar -xf arcgis-4.0.0-cookbooks.tar.gz -C /opt/cinc\n",
"cp /opt/cinc/templates/arcgis-enterprise-base/10.9.1/linux/arcgis-enterprise-fileserver.json /opt/cinc\n",
"(cd /opt/cinc && cinc-client -z -j arcgis-enterprise-fileserver.json)\n",

I re-ran the primary installation but that failed again creating the Portal site.

Here is output of the auto.master file. Does something need to change in here?

# Sample auto.master file
# This is a 'master' automounter map and it has the following format:
# mount-point [map-type[,format]:]map [options]
# For details of the format look at auto.master(5).
#
/misc   /etc/auto.misc
#
# NOTE: mounts done from a hosts map will be mounted with the
#       "nosuid" and "nodev" options unless the "suid" and "dev"
#       options are explicitly given.
#
/net    -hosts
#
# Include /etc/auto.master.d/*.autofs
# The included files must conform to the format of this file.
#
+dir:/etc/auto.master.d
#
# If you have fedfs set up and the related binaries, either
# built as part of autofs or installed from another package,
# uncomment this line to use the fedfs program map to access
# your fedfs mounts.
#/nfs4  /usr/sbin/fedfs-map-nfs4 nobind
#
# Include central master map if it can be found using
# nsswitch sources.
#
# Note that if there are entries for /net or /misc (as
# above) in the included master map any keys that are the
# same will not be seen as the first read key seen takes
# precedence.
#
+auto.master

Also, tried running this from the Primary but that didn't work:

touch /net/[FS_IPaddress]/gisdata/arcgisportal/content/testFile.txt
touch: cannot touch '/net/[FS_IPaddress]/gisdata/arcgisportal/content/testFile.txt': No such file or directory
cameronkroeker commented 2 years ago

Hi @DGrady83,

Could you verify that the UID of the arcgis account match on both the file server and primary server? To check this run id arcgis command. For example:

arcgis@FileServer:~$ id arcgis
uid=1003(arcgis) gid=1005(arcgis) groups=1005(arcgis)

and

arcgis@PrimaryServer:~$ id arcgis
uid=1003(arcgis) gid=1005(arcgis) groups=1005(arcgis)

Thanks, Cameron K.

DGrady83 commented 2 years ago

Hi @cameronkroeker

Both appear to be identical. The Primary is on the left and the FS is on the right. We don't have to do anything with ssh keys in order to establish connection?

image

Thank you Dan

cameronkroeker commented 2 years ago

Hi @cameronkroeker

Both appear to be identical. The Primary is on the left and the FS is on the right. We don't have to do anything with ssh keys in order to establish connection?

image

Thank you Dan

Thank you for confirming the UID matches. Let's confirm that autofs is started/enabled on both nodes:

$ sudo systemctl status autofs.service

If its not started:

$ sudo systemctl enable autofs.service
$ sudo systemctl start autofs.service

Or if its started lets try restarting it:

$ sudo systemctl restart autofs.service

Thanks, Cameron K.

DGrady83 commented 2 years ago

OK, maybe that's the issue - autofs is not installed on the Primary. I can have it pre-installed similar to what I did with the FS and see if that changes anything.

Thanks Dan

DGrady83 commented 2 years ago

Hi @cameronkroeker - I now have atuofs set to auto-install on the primary and it seems to be running. Now I see this error in the startup log when running the primary.json file on the primary server.

image

DGrady83 commented 2 years ago

Hi @cameronkroeker - Any thoughts on anything else I should try here? Both servers have autofs installed and running, identical UID arcgis accounts, etc.

Not sure what else to check at this point.

thank you Dan

cameronkroeker commented 2 years ago

Hi @DGrady83,

I am not quite sure what the missing piece here is. Perhaps firewalld is enabled and blocking it? As a temporary test we could try stopping firewalld.service to see if that allows the request to go through. If it does then we know the issue is related to firewalld.

$ sudo systemctl stop firewalld.service

Another thing to try is installing nfs-utils:

$ sudo yum install -y nfs-utils

If these suggestions don't help or work, I recommend opening a support case with Esri Technical Support, as they will have additional resources such as the ability to screen share to better troubleshoot the issue in more depth.

Thanks, Cameron K.

DGrady83 commented 2 years ago

Thanks @cameronkroeker - I'll try those out.

I did have another question - for the "run_as_user" variable, the default is arcgis which is what I have been trying. Should that be a service account within our domain instead?

DGrady83 commented 2 years ago

@cameronkroeker - could you also confirm all firewall ports that are supposed to be opened?

rlhadsel commented 2 years ago

Hey @DGrady83, the "run_as_user" variable can be set to a the default 'arcgis' user (which would result in a local account being created) or a domain account using the syntax "Domain\\username". Either account should work, of course when organizational policies allow the use of either.

rlhadsel commented 2 years ago

With respect to the ArcGIS Enterprise application components, this diagram could be useful for determining which ports need to be opened: https://enterprise.arcgis.com/en/system-requirements/latest/windows/pdf/ports-enterprise-deploy-dgm.pdfl A quick google search for the port NFS uses in UNIX systems looks like port 2049: https://library.netapp.com/ecmdocs/ECMP1368834/html/GUID-C764CE34-6F5B-42BC-B04B-7001744A44A3.html#:~:text=Network%20File%20System%20(NFS)%20is,mountd%2C%20statd%2C%20and%20nlm. Hope this helps @DGrady83!

DGrady83 commented 2 years ago

Thanks @cameronkroeker @rlhadsel - another question on the fileserver.json that runs. should we be manually adding /hostname prefixed to the gisdata directories like below? I'm still a little confused on what the proper folder structure should be in the directories and shares sections below.

{
    "arcgis": {
        "version": "10.9.1",
        "run_as_user": "arcgis",
        "fileserver": {
            "directories": [
                "/hostname/gisdata/arcgisserver",
                "/hostname/gisdata/arcgisbackup",
                "/hostname/gisdata/arcgisbackup/tilecache",
                "/hostname/gisdata/arcgisbackup/relational",
                "/hostname/gisdata/arcgisportal",
                "/hostname/gisdata/arcgisportal/content"
            ],
            "shares": [
                "/hostname/gisdata/arcgisserver",
                "/hostname/gisdata/arcgisbackup",
                "/hostname/gisdata/arcgisportal"
            ]
        }
    },
    "run_list": [
        "recipe[nfs::server]",
        "recipe[arcgis-enterprise::system]",
        "recipe[arcgis-enterprise::fileserver]"
    ]
}
cameronkroeker commented 2 years ago

Thanks @cameronkroeker @rlhadsel - another question on the fileserver.json that runs. should we be manually adding /hostname prefixed to the gisdata directories like below? I'm still a little confused on what the proper folder structure should be in the directories and shares sections below.

{
    "arcgis": {
        "version": "10.9.1",
        "run_as_user": "arcgis",
        "fileserver": {
            "directories": [
                "/hostname/gisdata/arcgisserver",
                "/hostname/gisdata/arcgisbackup",
                "/hostname/gisdata/arcgisbackup/tilecache",
                "/hostname/gisdata/arcgisbackup/relational",
                "/hostname/gisdata/arcgisportal",
                "/hostname/gisdata/arcgisportal/content"
            ],
            "shares": [
                "/hostname/gisdata/arcgisserver",
                "/hostname/gisdata/arcgisbackup",
                "/hostname/gisdata/arcgisportal"
            ]
        }
    },
    "run_list": [
        "recipe[nfs::server]",
        "recipe[arcgis-enterprise::system]",
        "recipe[arcgis-enterprise::fileserver]"
    ]
}

No we shouldn't need to make any modifications to the fileserver.json.

The fileserver.json will create the directories specified in node['arcgis']['fileserver']['directories'] attribute as local paths. For example:

arcgis@ip-10-0-2-231:/gisdata$ pwd
/gisdata

arcgis@ip-10-0-2-231:/gisdata$ ll
total 20
drwxr-xr-x  5 root   root 4096 Sep 29 04:00 ./
drwxr-xr-x 21 root   root 4096 Sep 29 04:00 ../
drwxr-xr-x  4 arcgis root 4096 Sep 29 04:00 arcgisbackup/
drwxr-xr-x  3 arcgis root 4096 Sep 30 02:40 arcgisportal/
drwxr-xr-x  4 arcgis root 4096 Sep 29 04:36 arcgisserver/

Then the directories specified in node['arcgis']['fileserver']['shares'] attribute are shared (nfs-utils is installed and started, and the paths are added to the /etc/exports):

arcgis@ip-10-0-2-231:/gisdata$ cat /etc/exports

# /etc/exports: the access control list for filesystems which may be exported
#               to NFS clients.  See exports(5).
#
# Example for NFSv2 and NFSv3:
# /srv/homes       hostname1(rw,sync,no_subtree_check) hostname2(ro,sync,no_subtree_check)
#
# Example for NFSv4:
# /srv/nfs4        gss/krb5i(rw,sync,fsid=0,crossmnt,no_subtree_check)
# /srv/nfs4/homes  gss/krb5i(rw,sync,no_subtree_check)
#
/gisdata/arcgisserver *(rw,sync,insecure,no_subtree_check,nohide)
/gisdata/arcgisbackup *(rw,sync,insecure,no_subtree_check,nohide)
/gisdata/arcgisportal *(rw,sync,insecure,no_subtree_check,nohide)

If you have autofs.service installed/started (on both fileserver and primary) you should then be able to access these directories via /net/fileserver-hostname/gisdata/arcgisportal or /net/fileserver-ip/gisdata/arcgisportal. This is from primary:

$ systemctl status autofs.service
● autofs.service - Automounts filesystems on demand
     Loaded: loaded (/lib/systemd/system/autofs.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2022-09-29 03:59:35 UTC; 23h ago
       Docs: man:autofs(8)
    Process: 183557 ExecReload=/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS)
   Main PID: 3305 (automount)
      Tasks: 4 (limit: 37885)
     Memory: 53.1M
     CGroup: /system.slice/autofs.service
             └─3305 /usr/sbin/automount --pid-file /var/run/autofs.pid

arcgis@ip-10-0-2-11:/opt$ touch /net/10.0.2.231/gisdata/arcgisportal/content/testFile.txt

arcgis@ip-10-0-2-11:/opt$ ls -al /net/10.0.2.231/gisdata/arcgisportal/content/testFile.txt
-rw-rw-r-- 1 arcgis arcgis 0 Sep 30 03:05 /net/10.0.2.231/gisdata/arcgisportal/content/testFile.txt

arcgis@ip-10-0-2-11:/opt$ cd /net/10.0.2.231/gisdata/arcgisportal/content

arcgis@ip-10-0-2-11:/net/10.0.2.231/gisdata/arcgisportal/content$ pwd
/net/10.0.2.231/gisdata/arcgisportal/content

arcgis@ip-10-0-2-11:/net/10.0.2.231/gisdata/arcgisportal/content$ ls -al
total 1120
drwxr-xr-x     3 arcgis root      4096 Sep 30 03:06 .
drwxr-xr-x     3 arcgis root      4096 Sep 30 03:06 ..
drwxr-x--- 20586 arcgis arcgis 1130496 Sep 29 18:04 items
-rw-r-----     1 arcgis arcgis       2 Sep 29 04:25 site-key.json
-rw-rw-r--     1 arcgis arcgis       0 Sep 30 03:05 testFile.txt

Have you tried restarting NFS on the fileserver?

$ sudo exports -a
$ sudo systemctl restart nfs-server.service

Also try accessing the share via the fileserver hostname instead of ip:

$ touch /net/FileserverHostName/gisdata/arcgisportal/content/testFile.txt

If autofs isn't going to work then as an alternative you can create an NFS mount.

  1. Run fileserver.json on fileserver machine
  2. On primary machine create an nfs mount:

    Note: 10.0.2.231 is the ip of the fileserver machine. You can use hostname instead of ip as well.

$ sudo yum install nfs-utils
$ sudo mkdir -p /nfs_share/gisdata
$ sudo mount 10.0.2.231:/gisdata /nfs_share/gisdata -o noac
$ sudo df -h

Filesystem                        Size  Used Avail Use% Mounted on
...
10.0.2.231:/gisdata/arcgisportal   97G  4.9G   92G   5% /net/10.0.2.231/gisdata/arcgisportal
10.0.2.231:/gisdata/arcgisserver   97G  4.9G   92G   5% /net/10.0.2.231/gisdata/arcgisserver
10.0.2.231:/gisdata                97G  4.9G   92G   5% /nfs_share/gisdata

For more detailed steps on NFS Mount see: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_file_systems/mounting-nfs-shares_managing-file-systems

Thanks, Cameron K.

DGrady83 commented 2 years ago

Hi @cameronkroeker - Thank you for the detailed info. I tried all above but still nothing seems to be working. Even stopping firewall on both servers, then trying to run the mount command, I get the mount.nfs: Connection timed out error. At this point I'm thinking it may be a security policy on the EC2 machines I am using. Should I try adding anything specific to those?

empeekdev commented 7 months ago

For anyone who will have same issue. Our primary server was not able to access fileserver folder as well. The issue we had was that autofs couldn't mount NFS shares automatically. We spend a lot of timing trying to investigate this issue and decided just to mount these folder manually.