Open girish-Pillai opened 11 months ago
DAG_ID = "dataproc_pyspark_test" PROJECT_ID = "project_id" CLUSTER_NAME = "simplesparkjob-airflow-cluster" REGION = "us-east1" JOB_FILE_URI = "gs://bucketname/CodeFile/pyspark_test.py" STORAGE_BUCKET = "bucketname" YESTERDAY = datetime.now() - timedelta(days=1) default_dag_args = { 'depends_on_past': False, 'start_date': YESTERDAY, } CLUSTER_CONFIG = ClusterGenerator( project_id=PROJECT_ID, region=REGION, cluster_name=CLUSTER_NAME, num_workers=2, storage_bucket=STORAGE_BUCKET, num_masters=1, master_machine_type="n2-standard-2", master_disk_type="pd-standard", master_disk_size=50, worker_machine_type="n2-standard-2", worker_disk_type="pd-standard", worker_disk_size=50, properties={}, image_version="2.1-ubuntu20", autoscaling_policy=None, idle_delete_ttl=1800, metadata={"PIP_PACKAGES": 'apache-airflow apache-airflow-providers-google google-api-python-client google-auth-oauthlib google-auth-httplib2'}, init_actions_uris =["gs://goog-dataproc-initialization-actions-us-east1/python/pip-install.sh"] ).make(0) DAG to create the Cluster with DAG( DAG_ID, schedule="@once", default_args=default_dag_args, description='A simple DAG to create a Dataproc workflow', ) as dag: create_cluster = DataprocCreateClusterOperator( task_id="create_cluster", project_id=PROJECT_ID, cluster_config=CLUSTER_CONFIG, region=REGION, cluster_name=CLUSTER_NAME, ) I am getting This Error Details: [Missing GCE VM simplesparkjob-airflow-cluster-m.]" debug_error_string = "UNKNOWN:Error received from peer ipv4:108.177.12.95:443 {created_time:"2023-12-21T07:35:50.762052955+00:00", grpc_status:9, grpc_message:"Detected missing master VMs!\nDetails: [Missing GCE VM simplesparkjob-airflow-cluster-m.]"}" google.api_core.exceptions.FailedPrecondition: 400 Detected missing master VMs! Details: [Missing GCE VM simplesparkjob-airflow-cluster-m.]
DAG_ID = "dataproc_pyspark_test" PROJECT_ID = "project_id" CLUSTER_NAME = "simplesparkjob-airflow-cluster" REGION = "us-east1" JOB_FILE_URI = "gs://bucketname/CodeFile/pyspark_test.py" STORAGE_BUCKET = "bucketname"
YESTERDAY = datetime.now() - timedelta(days=1)
default_dag_args = { 'depends_on_past': False, 'start_date': YESTERDAY, }
CLUSTER_CONFIG = ClusterGenerator( project_id=PROJECT_ID, region=REGION, cluster_name=CLUSTER_NAME, num_workers=2, storage_bucket=STORAGE_BUCKET, num_masters=1, master_machine_type="n2-standard-2", master_disk_type="pd-standard", master_disk_size=50, worker_machine_type="n2-standard-2", worker_disk_type="pd-standard", worker_disk_size=50, properties={}, image_version="2.1-ubuntu20", autoscaling_policy=None, idle_delete_ttl=1800, metadata={"PIP_PACKAGES": 'apache-airflow apache-airflow-providers-google google-api-python-client google-auth-oauthlib google-auth-httplib2'}, init_actions_uris =["gs://goog-dataproc-initialization-actions-us-east1/python/pip-install.sh"] ).make(0)
with DAG( DAG_ID, schedule="@once", default_args=default_dag_args, description='A simple DAG to create a Dataproc workflow', ) as dag:
create_cluster = DataprocCreateClusterOperator( task_id="create_cluster", project_id=PROJECT_ID, cluster_config=CLUSTER_CONFIG, region=REGION, cluster_name=CLUSTER_NAME,
)
Details: [Missing GCE VM simplesparkjob-airflow-cluster-m.]" debug_error_string = "UNKNOWN:Error received from peer ipv4:108.177.12.95:443 {created_time:"2023-12-21T07:35:50.762052955+00:00", grpc_status:9, grpc_message:"Detected missing master VMs!\nDetails: [Missing GCE VM simplesparkjob-airflow-cluster-m.]"}" google.api_core.exceptions.FailedPrecondition: 400 Detected missing master VMs! Details: [Missing GCE VM simplesparkjob-airflow-cluster-m.]
This is just part of the error. It's a huge Log
Can you please help me find out the issue?
Thank you
The cluster may have failed to initialize properly.
So I created this DAG with the following CLUSTER_CONFIG.
This is just part of the error. It's a huge Log
Can you please help me find out the issue?
Thank you