11.1 Portal upgrade failure

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request

Module Version

4.1.0

Affected Resource(s)

Invoke-ArcGISConfiguration
- ArcGIS_PortalUpgrade

Configuration Files

{
    "Notes": {
        "Updated": "2023-08-08",
        "Version": "0.1.0",
        "ArcGISModule": "4.1.0"
    },
    "AllNodes": [
        {
            "NodeName": "redacted",
            "Role": [
                "Portal"
            ]
        },
        {
            "NodeName": "redacted",
            "Role": [
                "Portal"
            ]
        }
    ],
    "ConfigData": {
        "Version": "11.1",
        "OldVersion": "10.9.1",
        "ServerContext": "arcgis",
        "PortalContext": "arcgis",
        "DownloadPatches": true,
        "Credentials": {
            "ServiceAccount": {
                "UserName": "arcgis",
                "Password": "redacted",
                "IsMSAAccount": false,
                "IsDomainAccount": false
            }
        },
        "Portal": {
            "LicenseFilePath": "C:\\AllUTs_AllAddOnApps.json",
            "PortalLicenseUserTypeId": "creatorUT",
            "EnableAutomaticAccountCreation": false,
            "DisableServiceDirectory": true,
            "DisableAnonymousAccess": true,
            "EnableHSTS": true,
            "Installer": {
                "Path": "C:\\Portal_for_ArcGIS_Windows_111_185219.exe",
                "PatchesDir": "C:\\ArcGISPatches",
                "InstallDir": "C:\\Program Files\\ArcGIS\\Portal",
                "ContentDir": "C:\\arcgisportal"
            },
            "ContentDirectoryLocation": "\\\\redacted\\arcgisportal\\content",
            "PortalAdministrator": {
                "UserName": "portaladmin",
                "Email": "redacted@redacted.com",
                "Password": "redacted",
                "SecurityQuestionIndex": 1,
                "SecurityAnswer": "redacted"
            }
        }
    }
}

Expected Behavior

The HA portal deployment is successfully upgraded

Actual Behavior

The HA portal deployment fails

Description

We have attempted to upgrade an HA Enterprise deployment from 10.9.1 to 11.1 and discovered that the process is failing. Our current thought/testing is that this appears to be caused by the order of the nodes defined in the json config. Specifically, the Portal nodes need to be ordered so that the primary Portal instance in the HA site is listed first.

Based off the logic defined in the Invoke-PortalUpgradeScript function, the primary/secondary machines are determined by the order in which they are listed in the json config. This determination is then used to kick off a step on the assumed secondary Portal which only updates the Portal DataStore host identifier prop file at C:\Program Files\ArcGIS\Portal\framework\runtime\ds\framework\etc\hostidentifier.properties and then restarts the Portal service. The actual post upgrade step is then carried out upon the assumed primary.

The documentation for upgrading Portal states ... then start the upgrade process on either machine which seems to indicate the issue via DSC may be related to the restarting of the assumed secondary Portal which is in fact the primary portal based on the actual site configuration.

We have been able to reproduce this in two separate HA deployments as well as found that we can work around it by ensuring the primary portal site is listed first in the json config.

We are not sure if Portal attempts a half-baked failover when the primary goes down for a restart during an upgrade but if it does, that could explain what we are seeing in our testing.

Steps to Reproduce

Deploy an HA portal environment at 10.9.1 with DSC
Within the config provided above (used for upgrading to 11.1), set the second node in the array to the primary machine in the HA portal site. You should verify which machine is listed as primary via .../portaladmin/machines

Start the upgrade site which should fail on the PortalPostUpgrade step.

The PortalPostUpgrade step gets to the final phase (Upgrade standby machine) and then errors out with

{"lastUpdated":1691524771647,"name":"Upgrade database","startTime":1691524615102,"state":"completed"},{"lastUpdated":1691524824307,"name":"Migrate configuration settings","startTime":1691524822608,"state":"completed"},{"lastUpdated":1691524919123,"name":"Update configuration settings","startTime":1691524877634,"state":"completed"},{"lastUpdated":1691524877634,"name":"Configure index service","startTime":1691524844022,"state":"completed"},{"lastUpdated":1691525041227,"name":"Reindex","startTime":1691524920167,"state":"completed"},{"lastUpdated":1691525930553,"name":"Upgrade standby machine","startTime":1691525210616,"state":"failed"}],"messages":["Index Service configuration failed."],"recheckAfterSeconds":20}

Esri / arcgis-powershell-dsc