GoogleCloudPlatform / professional-services

Common solutions and tools developed by Google Cloud's Professional Services team. This repository and its contents are not an officially supported Google product.
Apache License 2.0
2.82k stars 1.33k forks source link

Add dataproc job metric utility tool #1318

Closed CYarros10 closed 2 months ago

CYarros10 commented 2 months ago

This tool contains a python script to collect dataproc job metrics. These metrics provide deeper insight into the performance of Dataproc jobs. The user can compare dataproc job runs with different dataproc job/cluster configurations and property settings. Also helpful when comparing Dataproc jobs with on-prem hadoop, spark, etc. jobs.

This utility can be scheduled via Cloud Functions + Cloud Scheduler, Cloud Workflows, or utilized in an Airflow DAG on Cloud Composer for continuous metric collection and historical analysis.