GoogleCloudPlatform / professional-services

Common solutions and tools developed by Google Cloud's Professional Services team. This repository and its contents are not an officially supported Google product.
Apache License 2.0
2.83k stars 1.34k forks source link

[Asset Inventory] Dataflow stalled (over 24h instead of 3h) on full CAI Ressource Dump #1373

Open MarcFundenberger opened 2 weeks ago

MarcFundenberger commented 2 weeks ago

We are using the tool "Asset Inventory" DataFlow Pipeline to load all of our Resources and IAM Rules in BigQuey Datasets. Suddenly on the 18th of October 2024, the "resource" run of the DataFlow pipeline did not finish (we stopped it after more than 24h). Prior to that date, typical running time was 3h.

The CAI dump is a full resource dump, meaning we do not specify any assetType to dump.

We tried removing the few new assetType listed in CAI Release Notes from the list of assets found in CAI documentation. That resulted in normal succesfull load, BUT the CAI Dump file is way smaller (60GiB instead of 70GiB for a dump), which indicates that we are missing some important assetTypes...

Is there any way to make sure that the Dataflow pipeline can handle a "full" CAI resource dump ?

bmenasha commented 2 weeks ago

Marc can you grant my google account (bmenasha@google.com) access to the CAI export that fails to be imported into BigQuery? thanks

MarcFundenberger commented 2 weeks ago

Done, you should have access to (both link to same file): https://storage.cloud.google.com/adeo-resource-inventory-dump-preprod/ExportAsset-2024-10-23.dump gs://adeo-resource-inventory-dump-preprod/ExportAsset-2024-10-23.dump

MarcFundenberger commented 2 weeks ago

From additional tests, it seems that the culprit is the "cloudbuild.googleapis.com/Build" asset type. In a "full" resource dump, I have over 1.6 million of such assets. AFAIK, there is not mention of such an asset type in the Cloud Asset Inventory documentation or release Notes...