anjangcp / GCP-Data-Engineering-Demo-Codes

Demo Codes will be shared here
40 stars 52 forks source link

Hi Anjan, I refered your code to create the DAG but my DAG fails, Can you please help me? #1

Open girish-Pillai opened 11 months ago

girish-Pillai commented 11 months ago

So I created this DAG with the following CLUSTER_CONFIG.

DAG_ID = "dataproc_pyspark_test" PROJECT_ID = "project_id" CLUSTER_NAME = "simplesparkjob-airflow-cluster" REGION = "us-east1" JOB_FILE_URI = "gs://bucketname/CodeFile/pyspark_test.py" STORAGE_BUCKET = "bucketname"

YESTERDAY = datetime.now() - timedelta(days=1)

default_dag_args = { 'depends_on_past': False, 'start_date': YESTERDAY, }

CLUSTER_CONFIG = ClusterGenerator( project_id=PROJECT_ID, region=REGION, cluster_name=CLUSTER_NAME, num_workers=2, storage_bucket=STORAGE_BUCKET, num_masters=1, master_machine_type="n2-standard-2", master_disk_type="pd-standard", master_disk_size=50, worker_machine_type="n2-standard-2", worker_disk_type="pd-standard", worker_disk_size=50, properties={}, image_version="2.1-ubuntu20", autoscaling_policy=None, idle_delete_ttl=1800, metadata={"PIP_PACKAGES": 'apache-airflow apache-airflow-providers-google google-api-python-client google-auth-oauthlib google-auth-httplib2'}, init_actions_uris =["gs://goog-dataproc-initialization-actions-us-east1/python/pip-install.sh"] ).make(0)

DAG to create the Cluster

with DAG( DAG_ID, schedule="@once", default_args=default_dag_args, description='A simple DAG to create a Dataproc workflow', ) as dag:

create_cluster = DataprocCreateClusterOperator( task_id="create_cluster", project_id=PROJECT_ID, cluster_config=CLUSTER_CONFIG, region=REGION, cluster_name=CLUSTER_NAME,

)

I am getting This Error

Details: [Missing GCE VM simplesparkjob-airflow-cluster-m.]"
debug_error_string = "UNKNOWN:Error received from peer ipv4:108.177.12.95:443 {created_time:"2023-12-21T07:35:50.762052955+00:00", grpc_status:9, grpc_message:"Detected missing master VMs!\nDetails: [Missing GCE VM simplesparkjob-airflow-cluster-m.]"}"
google.api_core.exceptions.FailedPrecondition: 400 Detected missing master VMs!
Details: [Missing GCE VM simplesparkjob-airflow-cluster-m.]

This is just part of the error. It's a huge Log

Can you please help me find out the issue?

Thank you

deneonitin commented 3 weeks ago

The cluster may have failed to initialize properly.