GoogleCloudPlatform / professional-services

Common solutions and tools developed by Google Cloud's Professional Services team. This repository and its contents are not an officially supported Google product.
Apache License 2.0
2.84k stars 1.33k forks source link

CAI Export Pipeline is broken: KeyError: "definitions [while running 'assign_group_by_key']" #665

Closed jf-marquis-Adeo closed 3 years ago

jf-marquis-Adeo commented 3 years ago

Hello @bmenasha, pipeline is broken since yesterday evening. I imagine something has changed on google side? Can you help please? thanks a lot in advanced

{
  "insertId": "8652472802243659974:124764:0:75485",
  "jsonPayload": {
    "step": "s5-read-shuffle43",
    "exception": "Traceback (most recent call last):\n  File \"apache_beam/runners/common.py\", line 1213, in apache_beam.runners.common.DoFnRunner.process\n  File \"apache_beam/runners/common.py\", line 569, in apache_beam.runners.common.SimpleInvoker.invoke_process\n  File \"apache_beam/runners/common.py\", line 1374, in apache_beam.runners.common._OutputProcessor.process_outputs\n  File \"apache_beam/runners/worker/operations.py\", line 155, in apache_beam.runners.worker.operations.ConsumerSet.receive\n  File \"apache_beam/runners/worker/operations.py\", line 1006, in apache_beam.runners.worker.operations.PGBKCVOperation.process\n  File \"apache_beam/runners/worker/operations.py\", line 1035, in apache_beam.runners.worker.operations.PGBKCVOperation.process\n  File \"./asset_inventory/import_pipeline.py\", line 150, in add_input\n    resource_schema = self.element_to_schema(element)\n  File \"./asset_inventory/import_pipeline.py\", line 147, in element_to_schema\n    'iam_policy' in element)\n  File \"/usr/local/lib/python3.7/site-packages/asset_inventory/api_schema.py\", line 454, in bigquery_schema_for_resource\n    resource_name)\n  File \"/usr/local/lib/python3.7/site-packages/asset_inventory/api_schema.py\", line 102, in _get_schema_for_resource\n    for document in discovery_documents\n  File \"/usr/local/lib/python3.7/site-packages/asset_inventory/api_schema.py\", line 102, in <listcomp>\n    for document in discovery_documents\n  File \"/usr/local/lib/python3.7/site-packages/asset_inventory/api_schema.py\", line 280, in _translate_resource_to_schema\n    resources = cls._get_document_resources(document)\n  File \"/usr/local/lib/python3.7/site-packages/asset_inventory/api_schema.py\", line 272, in _get_document_resources\n    return document['definitions']\nKeyError: 'definitions'\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py\", line 649, in do_work\n    work_executor.execute()\n  File \"/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py\", line 179, in execute\n    op.start()\n  File \"dataflow_worker/shuffle_operations.py\", line 63, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start\n  File \"dataflow_worker/shuffle_operations.py\", line 64, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start\n  File \"dataflow_worker/shuffle_operations.py\", line 79, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start\n  File \"dataflow_worker/shuffle_operations.py\", line 80, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start\n  File \"dataflow_worker/shuffle_operations.py\", line 84, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start\n  File \"apache_beam/runners/worker/operations.py\", line 356, in apache_beam.runners.worker.operations.Operation.output\n  File \"apache_beam/runners/worker/operations.py\", line 218, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive\n  File \"dataflow_worker/shuffle_operations.py\", line 261, in dataflow_worker.shuffle_operations.BatchGroupAlsoByWindowsOperation.process\n  File \"dataflow_worker/shuffle_operations.py\", line 268, in dataflow_worker.shuffle_operations.BatchGroupAlsoByWindowsOperation.process\n  File \"apache_beam/runners/worker/operations.py\", line 356, in apache_beam.runners.worker.operations.Operation.output\n  File \"apache_beam/runners/worker/operations.py\", line 218, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive\n  File \"apache_beam/runners/worker/operations.py\", line 703, in apache_beam.runners.worker.operations.DoOperation.process\n  File \"apache_beam/runners/worker/operations.py\", line 704, in apache_beam.runners.worker.operations.DoOperation.process\n  File \"apache_beam/runners/common.py\", line 1215, in apache_beam.runners.common.DoFnRunner.process\n  File \"apache_beam/runners/common.py\", line 1279, in apache_beam.runners.common.DoFnRunner._reraise_augmented\n  File \"apache_beam/runners/common.py\", line 1213, in apache_beam.runners.common.DoFnRunner.process\n  File \"apache_beam/runners/common.py\", line 569, in apache_beam.runners.common.SimpleInvoker.invoke_process\n  File \"apache_beam/runners/common.py\", line 1374, in apache_beam.runners.common._OutputProcessor.process_outputs\n  File \"apache_beam/runners/worker/operations.py\", line 218, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive\n  File \"apache_beam/runners/worker/operations.py\", line 703, in apache_beam.runners.worker.operations.DoOperation.process\n  File \"apache_beam/runners/worker/operations.py\", line 704, in apache_beam.runners.worker.operations.DoOperation.process\n  File \"apache_beam/runners/common.py\", line 1215, in apache_beam.runners.common.DoFnRunner.process\n  File \"apache_beam/runners/common.py\", line 1294, in apache_beam.runners.common.DoFnRunner._reraise_augmented\n  File \"/usr/local/lib/python3.7/site-packages/future/utils/__init__.py\", line 446, in raise_with_traceback\n    raise exc.with_traceback(traceback)\n  File \"apache_beam/runners/common.py\", line 1213, in apache_beam.runners.common.DoFnRunner.process\n  File \"apache_beam/runners/common.py\", line 569, in apache_beam.runners.common.SimpleInvoker.invoke_process\n  File \"apache_beam/runners/common.py\", line 1374, in apache_beam.runners.common._OutputProcessor.process_outputs\n  File \"apache_beam/runners/worker/operations.py\", line 155, in apache_beam.runners.worker.operations.ConsumerSet.receive\n  File \"apache_beam/runners/worker/operations.py\", line 1006, in apache_beam.runners.worker.operations.PGBKCVOperation.process\n  File \"apache_beam/runners/worker/operations.py\", line 1035, in apache_beam.runners.worker.operations.PGBKCVOperation.process\n  File \"./asset_inventory/import_pipeline.py\", line 150, in add_input\n    resource_schema = self.element_to_schema(element)\n  File \"./asset_inventory/import_pipeline.py\", line 147, in element_to_schema\n    'iam_policy' in element)\n  File \"/usr/local/lib/python3.7/site-packages/asset_inventory/api_schema.py\", line 454, in bigquery_schema_for_resource\n    resource_name)\n  File \"/usr/local/lib/python3.7/site-packages/asset_inventory/api_schema.py\", line 102, in _get_schema_for_resource\n    for document in discovery_documents\n  File \"/usr/local/lib/python3.7/site-packages/asset_inventory/api_schema.py\", line 102, in <listcomp>\n    for document in discovery_documents\n  File \"/usr/local/lib/python3.7/site-packages/asset_inventory/api_schema.py\", line 280, in _translate_resource_to_schema\n    resources = cls._get_document_resources(document)\n  File \"/usr/local/lib/python3.7/site-packages/asset_inventory/api_schema.py\", line 272, in _get_document_resources\n    return document['definitions']\nKeyError: \"definitions [while running 'assign_group_by_key']\"\n",
    "worker": "sre-relaunch-2021-06-29-06292141-1qo8-harness-f4fb",
    "job": "2021-06-29_21_41_43-2985006778211866139",
    "message": "An exception was raised when trying to execute the workitem 8833639980881219882 : Traceback (most recent call last):\n  File \"apache_beam/runners/common.py\", line 1213, in apache_beam.runners.common.DoFnRunner.process\n  File \"apache_beam/runners/common.py\", line 569, in apache_beam.runners.common.SimpleInvoker.invoke_process\n  File \"apache_beam/runners/common.py\", line 1374, in apache_beam.runners.common._OutputProcessor.process_outputs\n  File \"apache_beam/runners/worker/operations.py\", line 155, in apache_beam.runners.worker.operations.ConsumerSet.receive\n  File \"apache_beam/runners/worker/operations.py\", line 1006, in apache_beam.runners.worker.operations.PGBKCVOperation.process\n  File \"apache_beam/runners/worker/operations.py\", line 1035, in apache_beam.runners.worker.operations.PGBKCVOperation.process\n  File \"./asset_inventory/import_pipeline.py\", line 150, in add_input\n    resource_schema = self.element_to_schema(element)\n  File \"./asset_inventory/import_pipeline.py\", line 147, in element_to_schema\n    'iam_policy' in element)\n  File \"/usr/local/lib/python3.7/site-packages/asset_inventory/api_schema.py\", line 454, in bigquery_schema_for_resource\n    resource_name)\n  File \"/usr/local/lib/python3.7/site-packages/asset_inventory/api_schema.py\", line 102, in _get_schema_for_resource\n    for document in discovery_documents\n  File \"/usr/local/lib/python3.7/site-packages/asset_inventory/api_schema.py\", line 102, in <listcomp>\n    for document in discovery_documents\n  File \"/usr/local/lib/python3.7/site-packages/asset_inventory/api_schema.py\", line 280, in _translate_resource_to_schema\n    resources = cls._get_document_resources(document)\n  File \"/usr/local/lib/python3.7/site-packages/asset_inventory/api_schema.py\", line 272, in _get_document_resources\n    return document['definitions']\nKeyError: 'definitions'\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py\", line 649, in do_work\n    work_executor.execute()\n  File \"/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py\", line 179, in execute\n    op.start()\n  File \"dataflow_worker/shuffle_operations.py\", line 63, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start\n  File \"dataflow_worker/shuffle_operations.py\", line 64, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start\n  File \"dataflow_worker/shuffle_operations.py\", line 79, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start\n  File \"dataflow_worker/shuffle_operations.py\", line 80, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start\n  File \"dataflow_worker/shuffle_operations.py\", line 84, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start\n  File \"apache_beam/runners/worker/operations.py\", line 356, in apache_beam.runners.worker.operations.Operation.output\n  File \"apache_beam/runners/worker/operations.py\", line 218, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive\n  File \"dataflow_worker/shuffle_operations.py\", line 261, in dataflow_worker.shuffle_operations.BatchGroupAlsoByWindowsOperation.process\n  File \"dataflow_worker/shuffle_operations.py\", line 268, in dataflow_worker.shuffle_operations.BatchGroupAlsoByWindowsOperation.process\n  File \"apache_beam/runners/worker/operations.py\", line 356, in apache_beam.runners.worker.operations.Operation.output\n  File \"apache_beam/runners/worker/operations.py\", line 218, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive\n  File \"apache_beam/runners/worker/operations.py\", line 703, in apache_beam.runners.worker.operations.DoOperation.process\n  File \"apache_beam/runners/worker/operations.py\", line 704, in apache_beam.runners.worker.operations.DoOperation.process\n  File \"apache_beam/runners/common.py\", line 1215, in apache_beam.runners.common.DoFnRunner.process\n  File \"apache_beam/runners/common.py\", line 1279, in apache_beam.runners.common.DoFnRunner._reraise_augmented\n  File \"apache_beam/runners/common.py\", line 1213, in apache_beam.runners.common.DoFnRunner.process\n  File \"apache_beam/runners/common.py\", line 569, in apache_beam.runners.common.SimpleInvoker.invoke_process\n  File \"apache_beam/runners/common.py\", line 1374, in apache_beam.runners.common._OutputProcessor.process_outputs\n  File \"apache_beam/runners/worker/operations.py\", line 218, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive\n  File \"apache_beam/runners/worker/operations.py\", line 703, in apache_beam.runners.worker.operations.DoOperation.process\n  File \"apache_beam/runners/worker/operations.py\", line 704, in apache_beam.runners.worker.operations.DoOperation.process\n  File \"apache_beam/runners/common.py\", line 1215, in apache_beam.runners.common.DoFnRunner.process\n  File \"apache_beam/runners/common.py\", line 1294, in apache_beam.runners.common.DoFnRunner._reraise_augmented\n  File \"/usr/local/lib/python3.7/site-packages/future/utils/__init__.py\", line 446, in raise_with_traceback\n    raise exc.with_traceback(traceback)\n  File \"apache_beam/runners/common.py\", line 1213, in apache_beam.runners.common.DoFnRunner.process\n  File \"apache_beam/runners/common.py\", line 569, in apache_beam.runners.common.SimpleInvoker.invoke_process\n  File \"apache_beam/runners/common.py\", line 1374, in apache_beam.runners.common._OutputProcessor.process_outputs\n  File \"apache_beam/runners/worker/operations.py\", line 155, in apache_beam.runners.worker.operations.ConsumerSet.receive\n  File \"apache_beam/runners/worker/operations.py\", line 1006, in apache_beam.runners.worker.operations.PGBKCVOperation.process\n  File \"apache_beam/runners/worker/operations.py\", line 1035, in apache_beam.runners.worker.operations.PGBKCVOperation.process\n  File \"./asset_inventory/import_pipeline.py\", line 150, in add_input\n    resource_schema = self.element_to_schema(element)\n  File \"./asset_inventory/import_pipeline.py\", line 147, in element_to_schema\n    'iam_policy' in element)\n  File \"/usr/local/lib/python3.7/site-packages/asset_inventory/api_schema.py\", line 454, in bigquery_schema_for_resource\n    resource_name)\n  File \"/usr/local/lib/python3.7/site-packages/asset_inventory/api_schema.py\", line 102, in _get_schema_for_resource\n    for document in discovery_documents\n  File \"/usr/local/lib/python3.7/site-packages/asset_inventory/api_schema.py\", line 102, in <listcomp>\n    for document in discovery_documents\n  File \"/usr/local/lib/python3.7/site-packages/asset_inventory/api_schema.py\", line 280, in _translate_resource_to_schema\n    resources = cls._get_document_resources(document)\n  File \"/usr/local/lib/python3.7/site-packages/asset_inventory/api_schema.py\", line 272, in _get_document_resources\n    return document['definitions']\nKeyError: \"definitions [while running 'assign_group_by_key']\"\n",
    "logger": "root:batchworker.py:do_work",
    "thread": "106:139688331458368",
    "stage": "s06"
  },
  "resource": {
    "type": "dataflow_step",
    "labels": {
      "step_id": "s5-read-shuffle43",
      "job_id": "2021-06-29_21_41_43-2985006778211866139",
      "region": "europe-west1",
      "job_name": "sre_relaunch_2021-06-29",
      "project_id": "dfdp-sre-data"
    }
  },
  "timestamp": "2021-06-30T04:51:14.832150220Z",
  "severity": "ERROR",
  "labels": {
    "dataflow.googleapis.com/region": "europe-west1",
    "dataflow.googleapis.com/job_id": "2021-06-29_21_41_43-2985006778211866139",
    "compute.googleapis.com/resource_name": "sre-relaunch-2021-06-29-06292141-1qo8-harness-f4fb",
    "compute.googleapis.com/resource_id": "8652472802243659974",
    "compute.googleapis.com/resource_type": "instance",
    "dataflow.googleapis.com/job_name": "sre_relaunch_2021-06-29"
  },
  "logName": "projects/dfdp-sre-data/logs/dataflow.googleapis.com%2Fworker",
  "receiveTimestamp": "2021-06-30T04:51:24.572961966Z"
}
boredabdel commented 3 years ago

Hi,

What specific tool are you talking about?

jf-marquis-Adeo commented 3 years ago

Hi I'm speaking about asset inventory. export from CAI to json dump file then a dataflow pipeline to ingest data into Big Query with per object split.

boredabdel commented 3 years ago

Ok Just so you know this repo is maintained on a best effort basis. If you have a fix feel free to contribute it

@bmenasha any idea what could cause this ?

jf-marquis-Adeo commented 3 years ago

I know, but I'm forced to use these tools because CAI PM doesn't want to export CAI information with historization or column descriptions.

boredabdel commented 3 years ago

I understand, But please understand that this is a best effort maintained repo.

If ben has time to have a look it will be good. Otherwise i'm afraid we cannot support.

bmenasha commented 3 years ago

Sorry for the delay Marquis, I updated the production pipeline just now with this change

https://github.com/GoogleCloudPlatform/professional-services/pull/666

hopefully it resolves the problem for you. thanks

boredabdel commented 3 years ago

Merged https://github.com/GoogleCloudPlatform/professional-services/pull/666

@jf-marquis-Adeo Have a look and let us know

jf-marquis-Adeo commented 3 years ago

it works thanks a lot

Envoyé de mon iPhone

Le 5 juil. 2021 à 12:27, Abdel SGHIOUAR @.***> a écrit :

 Merged #666

@jf-marquis-Adeo Have a look and let us know

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.