GoogleCloudDataproc / initialization-actions

Run in all nodes of your cluster before the cluster starts - lets you customize your cluster
https://cloud.google.com/dataproc/init-actions
Apache License 2.0
588 stars 512 forks source link

[oozie] intermittent error writing to HDFS during init action #1077

Closed cjac closed 4 months ago

cjac commented 1 year ago

Some users may experience script failure when clusters start the oozie init action script prior to HDFS being fully online.

+ hadoop fs -put -f /tmp/oozie-install-m95M/share /user/oozie/
2023-08-07 20:41:35,938 WARN hdfs.DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/oozie/share/lib/pig/hadoop-yarn-client-3.3.3.jar._COPYING_ could only be written to 0 of the 1 minReplication nodes. There are 0 datanode(s) running and 0 node(s) are excluded in this operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2315)
cjac commented 1 year ago

fixed in #1089

May need to be paired with --metadata startup-script-url="${INIT_ACTIONS_ROOT}/delay-masters-startup.sh"

#!/bin/bash

set -x

readonly ROLE="$(/usr/share/google/get_metadata_value attributes/dataproc-role)"
if [[ "${ROLE}" != 'Master' ]]; then set +x; exit 0; fi

node_number=$(echo ${HOSTNAME} | perl -ne '/-m-(\d+)/; print $1')
delay_seconds=$((${node_number} * 60))
sleep ${delay_seconds}s

NOW=$(date +"%F-%T")
echo "instance #${node_number} (${HOSTNAME}) proceeds at ${NOW}" | tee /var/log/delay-masters.log

set +x
cjac commented 4 months ago

This issues appears to be resolved