GoogleCloudPlatform / active-assist

Apache License 2.0
24 stars 9 forks source link

KeyError: key not found: results #12

Closed ryanaross closed 1 year ago

ryanaross commented 1 year ago

The Cloud Scheduler is currently sending these parameters to the main workflow:

{
  "allowMetrics": false,
  "essentialContactCategories": [],
  "isDryRun": true,
  "numDaysTTL": 30,
  "optOutProjectNumbers": [],
  "organizationId":<my org id number>,
  "region": "us-central1",
  "timeZoneName": "America/Los_Angeles"
}

But it fails after 2 minutes with this error:

Execution failed or cancelled.
in step "processRecommendations", routine "main", line: 100
{
  "message": "Execution failed or cancelled.",
  "operation": {
    "argument": "{\"allowMetrics\":false,\"datasetId\":\"recommendation_workflow_dataset\",\"essentialContactCategories\":[],\"formattedRunTimestamp\":\"2023-03-24\",\"formattedTtlTimestampForFirstTimeRecos\":\"2023-04-23\",\"isDryRun\":true,\"optOutProjectNumbers\":[],\"organizationId\":<my-org-id>,\"projectId\":\"<my-project-id>\",\"recommendationStatesTableId\":\"recommendation_states_<my-org-id>\",\"runTimestamp\":1679677671364,\"ttlTimestampForFirstTimeRecos\":1682233200000}",
    "duration": "116.970566652s",
    "endTime": "2023-03-24T17:09:50.972663117Z",
    "error": {
      "context": "KeyError: key not found: results\nin step \"verifyOnlyOneSetOfResults\", routine \"get_parent_contacts\", line: 506\nin step \"getParentContactsIfCategoriesEmpty\", routine \"get_escalation_contacts\", line: 424\nin step \"getContacts\", routine \"main\", line: 193",
      "payload": "{\"message\":\"KeyError: key not found: results\",\"tags\":[\"KeyError\",\"LookupError\"]}",
      "stackTrace": {
        "elements": [
          {
            "position": {
              "column": "21",
              "length": "4",
              "line": "193"
            },
            "routine": "main",
            "step": "getContacts"
          },
          {
            "position": {
              "column": "20",
              "length": "4",
              "line": "424"
            },
            "routine": "get_escalation_contacts",
            "step": "getParentContactsIfCategoriesEmpty"
          },
          {
            "position": {
              "column": "22",
              "length": "42",
              "line": "506"
            },
            "routine": "get_parent_contacts",
            "step": "verifyOnlyOneSetOfResults"
          }
        ]
      }
    },
    "name": "projects/<my id number>/locations/us-central1/workflows/recommendations_workflow_process_recommendations/executions/bdb9ed18-700f-4c15-9d94-1848edf40abf",
    "startTime": "2023-03-24T17:07:54.002096465Z",
    "state": "FAILED",
    "status": {
      "currentSteps": [
        {
          "routine": "main",
          "step": "getContacts"
        },
        {
          "routine": "get_escalation_contacts",
          "step": "getParentContactsIfCategoriesEmpty"
        },
        {
          "routine": "get_parent_contacts",
          "step": "verifyOnlyOneSetOfResults"
        }
      ]
    },
    "workflowRevisionId": "000001-4f6"
  },
  "tags": [
    "OperationError"
  ]
}

What sort of debugging can be done from here?

xiangwa commented 1 year ago

Thank you Ryan for flagging this! We're looking into it.

xiangwa commented 1 year ago

Ryan: I ran the workflow from scratch and everything was working in my environment. So I was wondering if there's anything particular to your environment.

Since it failed at the step where the workflow tried to fetch the IAM policy on the parent of a project for a specific role (roles/resourcemanager.folderAdmin or roles/resourcemanager.organizationAdmin), I guess that the parent had neither role assigned.

https://github.com/GoogleCloudPlatform/active-assist/blob/b8425b3a50aad4da46112aa794da0f7c57b2a69f/remora-project-cleaner/terraform/modules/project-cleanup/workflows/recommendations_workflow_process_recommendations.yaml#L500

I'll submit a PR to improve the logging to cover this case. In the meanwhile you could update the setWhatRolesToLookFor step to search for a role that exists on the parent resource, or grant someone roles/resourcemanager.organizationAdmin to see if that'll make this error go away.

xiangwa commented 1 year ago

I was able to reproduce the issue by looking for a fake role:

"Failed to find IAM policy results for the resource organizations/ and role roles/resourcemanager.foobar."