Esri / arcgis-powershell-dsc

This repository contains scripts, code and samples for automating the install and configuration of ArcGIS (Enterprise and Desktop) using Microsoft Windows PowerShell DSC (Desired State Configuration).
Apache License 2.0
110 stars 61 forks source link

11.1 Portal upgrade failure #490

Open PleaseStopAsking opened 11 months ago

PleaseStopAsking commented 11 months ago

Community Note

Module Version

Affected Resource(s)

Configuration Files

{
    "Notes": {
        "Updated": "2023-08-08",
        "Version": "0.1.0",
        "ArcGISModule": "4.1.0"
    },
    "AllNodes": [
        {
            "NodeName": "redacted",
            "Role": [
                "Portal"
            ]
        },
        {
            "NodeName": "redacted",
            "Role": [
                "Portal"
            ]
        }
    ],
    "ConfigData": {
        "Version": "11.1",
        "OldVersion": "10.9.1",
        "ServerContext": "arcgis",
        "PortalContext": "arcgis",
        "DownloadPatches": true,
        "Credentials": {
            "ServiceAccount": {
                "UserName": "arcgis",
                "Password": "redacted",
                "IsMSAAccount": false,
                "IsDomainAccount": false
            }
        },
        "Portal": {
            "LicenseFilePath": "C:\\AllUTs_AllAddOnApps.json",
            "PortalLicenseUserTypeId": "creatorUT",
            "EnableAutomaticAccountCreation": false,
            "DisableServiceDirectory": true,
            "DisableAnonymousAccess": true,
            "EnableHSTS": true,
            "Installer": {
                "Path": "C:\\Portal_for_ArcGIS_Windows_111_185219.exe",
                "PatchesDir": "C:\\ArcGISPatches",
                "InstallDir": "C:\\Program Files\\ArcGIS\\Portal",
                "ContentDir": "C:\\arcgisportal"
            },
            "ContentDirectoryLocation": "\\\\redacted\\arcgisportal\\content",
            "PortalAdministrator": {
                "UserName": "portaladmin",
                "Email": "redacted@redacted.com",
                "Password": "redacted",
                "SecurityQuestionIndex": 1,
                "SecurityAnswer": "redacted"
            }
        }
    }
}

Expected Behavior

The HA portal deployment is successfully upgraded

Actual Behavior

The HA portal deployment fails

Description

We have attempted to upgrade an HA Enterprise deployment from 10.9.1 to 11.1 and discovered that the process is failing. Our current thought/testing is that this appears to be caused by the order of the nodes defined in the json config. Specifically, the Portal nodes need to be ordered so that the primary Portal instance in the HA site is listed first.

Based off the logic defined in the Invoke-PortalUpgradeScript function, the primary/secondary machines are determined by the order in which they are listed in the json config. This determination is then used to kick off a step on the assumed secondary Portal which only updates the Portal DataStore host identifier prop file at C:\Program Files\ArcGIS\Portal\framework\runtime\ds\framework\etc\hostidentifier.properties and then restarts the Portal service. The actual post upgrade step is then carried out upon the assumed primary.

The documentation for upgrading Portal states ... then start the upgrade process on either machine which seems to indicate the issue via DSC may be related to the restarting of the assumed secondary Portal which is in fact the primary portal based on the actual site configuration.

We have been able to reproduce this in two separate HA deployments as well as found that we can work around it by ensuring the primary portal site is listed first in the json config.

We are not sure if Portal attempts a half-baked failover when the primary goes down for a restart during an upgrade but if it does, that could explain what we are seeing in our testing.

Steps to Reproduce

  1. Deploy an HA portal environment at 10.9.1 with DSC
  2. Within the config provided above (used for upgrading to 11.1), set the second node in the array to the primary machine in the HA portal site. You should verify which machine is listed as primary via .../portaladmin/machines
  3. Start the upgrade site which should fail on the PortalPostUpgrade step.
    • The PortalPostUpgrade step gets to the final phase (Upgrade standby machine) and then errors out with
      {"lastUpdated":1691524771647,"name":"Upgrade database","startTime":1691524615102,"state":"completed"},{"lastUpdated":1691524824307,"name":"Migrate configuration settings","startTime":1691524822608,"state":"completed"},{"lastUpdated":1691524919123,"name":"Update configuration settings","startTime":1691524877634,"state":"completed"},{"lastUpdated":1691524877634,"name":"Configure index service","startTime":1691524844022,"state":"completed"},{"lastUpdated":1691525041227,"name":"Reindex","startTime":1691524920167,"state":"completed"},{"lastUpdated":1691525930553,"name":"Upgrade standby machine","startTime":1691525210616,"state":"failed"}],"messages":["Index Service configuration failed."],"recheckAfterSeconds":20}

Important Factoids

References