GoogleCloudDataproc / initialization-actions

Run in all nodes of your cluster before the cluster starts - lets you customize your cluster
https://cloud.google.com/dataproc/init-actions
Apache License 2.0
588 stars 513 forks source link

Apache Drill initialisation action bug #912

Open e-compagno opened 3 years ago

e-compagno commented 3 years ago

It seems there is a bug in the Apache Drill initialisation actions:

Following instructions in https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/drill , I have created a cluster via

gcloud beta dataproc clusters create cluster-drill \
    --region us-central1 \
    --no-address \
    --zone us-central1-c \
    --single-node \
    --master-machine-type n1-standard-4 \
    --master-boot-disk-size 500 \
    --image-version 2.0-debian10 \
    --project myproject \
    --initialization-actions 'gs://goog-dataproc-initialization-actions-us-central1/drill/drill.sh' \
    --optional-components=zookeeper

However, once the cluster is created drill is not running properly: sudo ./drillbit.sh status returns

/usr/lib/drill/drillbit.pid file is present but drillbit is not running.

If I try starting manually with sudo ./drillbit.sh start, in the log file /usr/lib/drill/log/drillbit.out I get the error message:

ERROR o.a.c.f.imps.CuratorFrameworkImpl - Ensure path 
threw exception
org.apache.zookeeper.KeeperException$UnimplementedException: KeeperErrorCo
de = Unimplemented for /drill

However, I am able to run a standalone version of drill with

sudo /usr/lib/drill/bin/drill-embedded
e-compagno commented 3 years ago

Apparently it's a incompatibility version issue with Zookeeper and with the GCS connector.

To install Drill properly the previous configuration has to modified to

gcloud beta dataproc clusters create cluster-drill \
    --region us-central1 \
    --no-address \
    --zone us-central1-c \
    --single-node \
    --master-machine-type n1-standard-4 \
    --master-boot-disk-size 500 \
    --image-version 2.0-debian10 \
    --project myproject \
    --initialization-actions 'gs://goog-dataproc-initialization-actions-us-central1/drill/drill.sh' \
    --optional-components=ZOOKEEPER
    --metadata GCS_CONNECTOR_VERSION=2.0.1

In any case, I would suggest making an optional components in Dataproc with Drill to simplify the initialisation actions.