LLNL / magpie

Magpie contains a number of scripts for running Big Data software in HPC environments, including Hadoop and Spark. There is support for Lustre, Slurm, Moab, Torque. LSF, Flux, and more.
GNU General Public License v2.0
196 stars 53 forks source link

What i need to do to get a working copy of magpie? #275

Open grod-uy opened 6 years ago

grod-uy commented 6 years ago

Hi, After some days trying to start using magpie i don't know what to do.

I'm trying to use the basic terasort example but when i execute the job in a slurm cluster i get no output. this is my sbatch file, could someone tell me what i'm doing wrong?



#!/bin/sh
#############################################################################
#  Copyright (C) 2013-2015 Lawrence Livermore National Security, LLC.
#  Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
#  Written by Albert Chu <chu11@llnl.gov>
#  LLNL-CODE-644248
#
#  This file is part of Magpie, scripts for running Hadoop on
#  traditional HPC systems.  For details, see https://github.com/llnl/magpie.
#
#  Magpie is free software; you can redistribute it and/or modify it
#  under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  Magpie is distributed in the hope that it will be useful, but
#  WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
#  General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#  along with Magpie.  If not, see <http://www.gnu.org/licenses/>.
#############################################################################

############################################################################
# SLURM Customizations
############################################################################

# Node count.  Node count should include one node for the
# head/management/master node.  For example, if you want 8 compute
# nodes to process data, specify 9 nodes below.
#
# If including Zookeeper, include expected Zookeeper nodes.  For
# example, if you want 8 Hadoop compute nodes and 3 Zookeeper nodes,
# specify 12 nodes (1 master, 8 Hadoop, 3 Zookeeper)
#
# Also take into account additional nodes needed for other services,
# for example HDFS federation.
#
# Many of the below can be configured on the sbatch command line.  If
# you are more comfortable specifying these on the command line, feel
# free to delete the customizations below.

#SBATCH --nodes=2
#SBATCH --output="slurm-%j.out"

# Note defaults of MAGPIE_STARTUP_TIME & MAGPIE_SHUTDOWN_TIME, this
# timelimit should be a fair amount larger than them combined.
#SBATCH --time=300

# Job name.  This will be used in naming directories for the job.
#SBATCH --job-name=pruebas-hadoop-intento-

# Partition to launch job in
#SBATCH --partition=normal

## SLURM Values
# Generally speaking, don't touch the following, misc other configuration

#SBATCH --ntasks-per-node=1
#SBATCH --exclusive
#SBATCH --no-kill

# Need to tell Magpie how you are submitting this job
export MAGPIE_SUBMISSION_TYPE="sbatchsrun"

############################################################################
# Magpie Configurations
############################################################################

# Directory your launching scripts/files are stored
#
# Normally an NFS mount, someplace magpie can be reached on all nodes.
export MAGPIE_SCRIPTS_HOME="/home/root/magpie"

# Path to store data local to each cluster node, typically something
# in /tmp.  This will store local conf files and log files for your
# job.  If local scratch space is not available, consider using the
# MAGPIE_NO_LOCAL_DIR option.  See README for more details.
#
export MAGPIE_LOCAL_DIR="/tmp/${USER}/magpie"

# Magpie job type
#
# "hadoop" - Run a job according to the settings of HADOOP_MODE.
#
# "testall" - Run a job that runs all basic sanity tests for all
#             software that is configured to be setup.  This is a good
#             way to sanity check that everything has been setup
#             correctly and the way you like.
#
#             For Hadoop, testall will run terasort
#
# "script" - Run an arbitraty script, as specified by
#            MAGPIE_SCRIPT_PATH.  This functionally is very similar to
#            setting "script" in HADOOP_MODE or HBASE_MODE or
#            SPARK_MODE.
#
#            It is primarily used if you want to launch without
#            Hadoop/Hbase/Spark and are experimenting with things..
#
# "interactive" - manually interact with job run.  This functionally
#                 is very similar to setting "interactive" in
#                 HADOOP_MODE, HBASE_MODE, SPARK_MODE, etc.  It is
#                 primarily used if you want to launch without
#                 Hadoop/Hbase/Spark/etc. and are experimenting with
#                 things.
#
export MAGPIE_JOB_TYPE="script"

# Specify script to execute for "script" mode in MAGPIE_JOB_TYPE
#
export MAGPIE_SCRIPT_PATH="/home/root/magpie/examples/hadoop-example-job-script"

# Specify arguments for script specified in MAGPIE_SCRIPT_PATH
#
# Note that many Magpie generated environment variables are not
# generated until the job has launched.  You won't be able to use them
# here
#
# export MAGPIE_SCRIPT_ARGS="" 

# Specify script startup / shutdown time window
#
# Specifies the amount of time to give startup / shutdown activities a
# chance to succeed before Magpie will give up (or in the case of
# shutdown, when the resource manager/scheduler may kill the running
# job).  Defaults to 30 minutes for startup, 30 minutes for shutdown.
#
# The startup time in particular may need to be increased if you have
# a large amount of data.  As an example, HDFS may need to spend a
# significant amount of time determine all of the blocks in HDFS
# before leaving safemode.
#
# The stop time in particular may need to be increased if you have a
# large amount of cleanup to be done.  HDFS will save its NameSpace
# before shutting down.  Hbase will do a compaction before shutting
# down.
#
# The startup & shutdown window must together be smaller than the
# timelimit specified for the job.
#
# MAGPIE_STARTUP_TIME and MAGPIE_SHUTDOWN_TIME at minimum must be 5
# minutes.  If MAGPIE_POST_JOB_RUN is specified below,
# MAGPIE_SHUTDOWN_TIME must be at minimum 10 minutes.
#
export MAGPIE_STARTUP_TIME=30
export MAGPIE_SHUTDOWN_TIME=30

# Magpie One Time Run
#
# Normally, Magpie assumes that when a user runs a job, data created
# and stored within that job may be desired to be accessed again.  For
# example, data created and stored within HDFS will be accessed again.
#
# Under a number of scenarios, this may not be desired.  For example
# during testing.  In order to improve job throughout, you can set
# MAGPIE_ONE_TIME_RUN below to yes.  Magpie will assume that this is a
# one time run and the user will never care about any data that may
# have been created.  This will allow Magpie to take shortcuts to
# improve job throughput.  For example, job teardown may be done more
# quickly as we do not care about tearing down cleanly for future
# runs.
#
# export MAGPIE_ONE_TIME_RUN=yes

# Convenience Scripts
#
# Specify script to be executed to before / after your job.  It is run
# on all nodes.
#
# Typically the pre-job script is used to set something up or get
# debugging info.  It can also be used to determine if system
# conditions meet the expectations of your job.  The primary job
# running script (magpie-run) will not be executed if the
# MAGPIE_PRE_JOB_RUN exits with a non-zero exit code.
#
# The post-job script is typically used for cleaning up something or
# gathering info (such as logs) for post-debugging/analysis.  If it is
# set, MAGPIE_SHUTDOWN_TIME above must be > 5.
#
# See example magpie-example-pre-job-script and
# magpie-example-post-job-script for ideas of what you can do w/ these
# scripts
#
# A number of convenient scripts are available in the
# ${MAGPIE_SCRIPTS_HOME}/scripts directory.
#
# export MAGPIE_PRE_JOB_RUN="${MAGPIE_SCRIPTS_HOME}/scripts/pre-job-run-scripts/my-pre-job-script"
# export MAGPIE_POST_JOB_RUN="${MAGPIE_SCRIPTS_HOME}/scripts/post-job-run-scripts/my-post-job-script"

# Environment Variable Script
#
# When working with Magpie interactively by logging into the master
# node of your job allocation, many environment variables may need to
# be set.  For example, environment variables for config file
# directories (e.g. HADOOP_CONF_DIR, HBASE_CONF_DIR, etc.) and home
# directories (e.g. HADOOP_HOME, HBASE_HOME, etc.) and more general
# environment variables (e.g. JAVA_HOME) may need to be set before you
# begin interacting with your big data setup.
#
# The standard job output from Magpie provides instructions on all the
# environment variables typically needed to interact with your job.
# However, this can be tedious if done by hand.
#
# If the environment variable specified below is set, Magpie will
# create the file and put into it every environment variable that
# would be useful when running your job interactively.  That way, it
# can be sourced easily if you will be running your job interactively.
# It can also be loaded or used by other job scripts.
#
# export MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT="${HOME}/my-job-env"

# Environment Variable Shell Type
#
# Magpie outputs environment variables in help output and
# MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT based on your SHELL environment
# variable.
#
# If you would like to output in a different shell type (perhaps you
# have programmed scripts in a different shell), specify that shell
# here.
#
# export MAGPIE_ENVIRONMENT_VARIABLE_SCRIPT_SHELL="/bin/bash"

# Remote Shell
#
# Magpie requires a passwordless remote shell command to launch
# necessary daemons across your job allocation.  Magpie defaults to
# ssh, but it may be an alternate command in some environments.  An
# alternate ssh-equivalent remote command can be specified by setting
# MAGPIE_REMOTE_CMD below.
#
# If using ssh, Magpie requires keys to be setup ahead of time so it
# can be executed without passwords.
#
# Specify options to the remote shell command if necessary.
#
# export MAGPIE_REMOTE_CMD="ssh"
# export MAGPIE_REMOTE_CMD_OPTS=""

############################################################################
# General Configuration
############################################################################

# Necessary for most projects
export JAVA_HOME="/usr"

############################################################################
# Hadoop Core Configurations
############################################################################

# Should Hadoop be run
#
# Specify yes or no.  Defaults to no.
#
export HADOOP_SETUP=yes

# Set Hadoop Setup Type
#
# Will inform scripts on how to setup config files and what daemons to
# launch/setup.  The hadoop build/binaries set by HADOOP_HOME
# needs to match up with what you set here.
#
# MR1 - MapReduce/Hadoop 1.0 w/ HDFS
# MR2 - MapReduce/Hadoop 2.0 w/ HDFS
# HDFS1 - HDFS only w/ Hadoop 1.0
# HDFS2 - HDFS only w/ Hadoop 2.0
#
# The HDFS only options may be useful when you want to use HDFS with
# other big data software, such as Hbase, and do not care for using
# Hadoop MapReduce.  It only works with HDFS based
# HADOOP_FILESYSTEM_MODE, such as "hdfs", "hdfsoverlustre", or
# "hdfsovernetworkfs".
#
export HADOOP_SETUP_TYPE="MR2"

# Version
#
# Make sure the version for Mapreduce version 1 or 2 matches whatever
# you set in HADOOP_SETUP_TYPE
#
export HADOOP_VERSION="2.9.0"

# Path to your Hadoop build/binaries
#
# Make sure the build for MapReduce or HDFS version 1 or 2 matches
# whatever you set in HADOOP_SETUP_TYPE.
#
# This should be accessible on all nodes in your allocation. Typically
# this is in an NFS mount.
#
export HADOOP_HOME="/home/root/bigdata/hadoop-${HADOOP_VERSION}"

# Path to store data local to each cluster node, typically something
# in /tmp.  This will store local conf files and log files for your
# job.  If local scratch space is not available, consider using the
# MAGPIE_NO_LOCAL_DIR option.  See README for more details.
#
# This will not be used for storing intermediate files or
# distributed cache files.  See HADOOP_LOCALSTORE above for that.
#
export HADOOP_LOCAL_DIR="/tmp/${USER}/hadoop"

# Directory where alternate Hadoop configuration templates are stored
#
# If you wish to tweak the configuration files used by Magpie, set
# HADOOP_CONF_FILES below, copy configuration templates from
# $MAGPIE_SCRIPTS_HOME/conf/hadoop into HADOOP_CONF_FILES, and modify
# as you desire.  Magpie will still use configuration files in
# $MAGPIE_SCRIPTS_HOME/conf/hadoop if any of the files it needs are
# not found in HADOOP_CONF_FILES.
#
# export HADOOP_CONF_FILES="${HOME}/myconf"

# Daemon Heap Max
#
# Heap maximum for Hadoop daemons (i.e. Resource Manger, NodeManager,
# DataNode, History Server, etc.), specified in megs.  Special case
# for Namenode, see below.
#
# If not specified, defaults to Hadoop default of 1000
#
# May need to be increased if you are scaling large, get OutofMemory
# errors, or perhaps have a lot of cores on a node.
#
# export HADOOP_DAEMON_HEAP_MAX=2000

# Daemon Namenode Heap Max
#
# Heap maximum for Hadoop Namenode daemons specified in megs.
#
# If not specified, defaults to HADOOP_DAEMON_HEAP_MAX above.
#
# Unlike most Hadoop daemons, namenode may need more memory if there
# are a very large number of files in your HDFS setup.  A general rule
# of thumb is a 1G heap for each 100T of data.
#
# export HADOOP_NAMENODE_DAEMON_HEAP_MAX=2000

# Environment Extra
#
# Specify extra environment information that should be passed into
# Hadoop.  This file will simply be appended into the hadoop-env.sh
# and (if appropriate) yarn-env.sh.
#
# By default, a reasonable estimate for max user processes and open
# file descriptors will be calculated and put into hadoop-env.sh and
# (if appropriate) yarn-env.sh.  However, it's always possible they may
# need to be set differently. Everyone's cluster/situation can be
# slightly different.
#
# See the example example-environment-extra extra for examples on
# what you can/should do with adding extra environment settings.
#
# export HADOOP_ENVIRONMENT_EXTRA_PATH="${HOME}/hadoop-my-environment"

############################################################################
# Hadoop Job/Run Configurations
############################################################################

# Set how Hadoop should run
#
# "terasort" - run terasort.  Useful for making sure things are setup
#              the way you like.
#
#              There are additional configuration options for this
#              listed below.
#
# "script" - execute a script that lists all of your Hadoop jobs.  Be
#            sure to set HADOOP_SCRIPT_PATH to your script.
#
# "interactive" - manually interact to submit jobs, peruse HDFS, etc.
#                 also useful for moving data in/out of HDFS.  In this
#                 mode you'll login to the cluster node that is your
#                 'master' node and interact with Hadoop directly
#                 (e.g. bin/hadoop ...)
#
# "upgradehdfs" - upgrade your version of HDFS.  Most notably this is
#                 used when you are switching to a newer Hadoop
#                 version and the HDFS version would be inconsistent
#                 without upgrading.  Only works with HDFS versions >=
#                 2.2.0.
#
#             Please set your job time to be quite large when
#         performing this upgrade.  If your job times out and
#         this process does not complete fully, it can leave
#         HDFS in a bad state.
#
#         Beware, once you upgrade it'll be difficult to rollback.
#
# "decommissionhdfsnodes" - decrease your HDFS over Lustre or HDFS
#                           over NetworkFS node size just as if you
#                           were on a cluster with local disk.  Launch
#                           your job with the current present node
#                           size and set
#                           HADOOP_DECOMMISSION_HDFS_NODE_SIZE to the
#                           smaller node size to decommission into.
#                           Only works on Hadoop versions >= 2.3.0.
#
#                   Please set your job time to be quite large
#                   when performing this update.  If your job
#                   times out and this process does not
#                   complete fully, it can leave HDFS in a bad
#                   state.
#
# "launch" - Launch Hadoop but do nothing, usually set to this because
#            another project (e.g. Hbase, Pig) will run something that
#            uses Hadoop MapReduce.
#
# "setuponly" - Like 'interactive' but only setup conf files. useful
#               if user wants to setup & teardown daemons themselves.
#
# "hdfsonly" - For use if HADOOP_SETUP_TYPE is set to HDFS1 or HDFS2.
#
export HADOOP_MODE="terasort"

# Tasks per Node
#
# If not specified, a reasonable estimate will be calculated based on
# number of CPUs on the system.
#
# If running Hbase (or other Big Data software) with Hadoop MapReduce,
# be aware of the number of tasks and the amount of memory that may be
# needed by other software.
#
# export HADOOP_MAX_TASKS_PER_NODE=8

# Default Map tasks for Job
#
# If not specified, defaults to HADOOP_MAX_TASKS_PER_NODE * compute
# nodes.
#
# If running Hbase (or other Big Data software) with Hadoop MapReduce,
# be aware of the number of tasks and the amount of memory that may be
# needed by other software.
#
# export HADOOP_DEFAULT_MAP_TASKS=8

# Default Reduce tasks for Job
#
# If not specified, defaults to # compute nodes (i.e. 1 reducer per
# node)
#
# If running Hbase (or other Big Data software) with Hadoop MapReduce,
# be aware of the number of tasks and the amount of memory that may be
# needed by other software.
#
# export HADOOP_DEFAULT_REDUCE_TASKS=8

# Max Map tasks for Task Tracker
#
# If not specified, defaults to HADOOP_MAX_TASKS_PER_NODE
#
# If running Hbase (or other Big Data software) with Hadoop MapReduce,
# be aware of the number of tasks and the amount of memory that may be
# needed by other software.
#
# export HADOOP_MAX_MAP_TASKS=8

# Max Reduce tasks for Task Tracker
#
# If not specified, defaults to HADOOP_MAX_TASKS_PER_NODE
#
# If running Hbase (or other Big Data software) with Hadoop MapReduce,
# be aware of the number of tasks and the amount of memory that may be
# needed by other software.
#
# export HADOOP_MAX_REDUCE_TASKS=8

# Heap size for JVM
#
# Specified in M.  If not specified, a reasonable estimate will be
# calculated based on total memory available and number of CPUs on the
# system.
#
# HADOOP_CHILD_MAP_HEAPSIZE and HADOOP_CHILD_REDUCE_HEAPSIZE are for
# Yarn (i.e. HADOOP_SETUP_TYPE = MR2)
#
# If HADOOP_CHILD_MAP_HEAPSIZE is not specified, it is assumed to be
# HADOOP_CHILD_HEAPSIZE.
#
# If HADOOP_CHILD_REDUCE_HEAPSIZE is not specified, it is assumed to
# be 2X the HADOOP_CHILD_MAP_HEAPSIZE.
#
# If running Hbase (or other Big Data software) with Hadoop MapReduce,
# be aware of the number of tasks and the amount of memory that may be
# needed by other software.
#
# export HADOOP_CHILD_HEAPSIZE=2048
# export HADOOP_CHILD_MAP_HEAPSIZE=2048
# export HADOOP_CHILD_REDUCE_HEAPSIZE=4096

# Container Buffer
#
# Specify the amount of overhead each Yarn container will have over
# the heap size.  Specified in M.  If not specified, a reasonable
# estimate will be calculated based on total memory available.
#
# export HADOOP_CHILD_MAP_CONTAINER_BUFFER=256
# export HADOOP_CHILD_REDUCE_CONTAINER_BUFFER=512

# Mapreduce Slowstart, indicating percent of maps that should complete
# before reducers begin.
#
# If not specified, defaults to 0.05
#
# export HADOOP_MAPREDUCE_SLOWSTART=0.05

# Container Memory
#
# Memory on compute nodes for containers.  Typically "nice-chunk" less
# than actual memory on machine, b/c machine needs memory for its own
# needs (kernel, daemons, etc.).  Specified in megs.
#
# If not specified, a reasonable estimate will be calculated based on
# total memory on the system.
#
# export YARN_RESOURCE_MEMORY=32768

# Check Memory Limits
# 
# Should physical and virtual memory limits be enforced for containers.
# This can be helpful in cases where the OS (Centos/Redhat) is aggressive
# at allocating virtual memory and causes the vmem-to-pmem ratio to be
# hit. Defaults to true
#
# export YARN_VMEM_CHECK="false"
# export YARN_PMEM_CHECK="false"

# Compression
#
# Should compression of outputs and intermediate data be enabled.
# Specify yes or no.  Defaults to no.
#
# Effectively, is time spend compressing data going to save you time
# on I/O.  Sometimes yes, sometimes no.
#
# export HADOOP_COMPRESSION=yes

# IO Sort Factors + MB
#
# The number of streams of files to sort while reducing and the memory
# amount to use while sorting.  This is a quite advanced mechanism
# taking into account many factors.  If not specified, some reasonable
# number will be calculated.
#
# export HADOOP_IO_SORT_FACTOR=10
# export HADOOP_IO_SORT_MB=100

# Parallel Copies
#
# The default number of parallel transfers run by reduce during the
# copy(shuffle) phase.  If not specified, some reasonable number will
# be calculated.
# export HADOOP_PARALLEL_COPIES=10

############################################################################
# Hadoop Filesystem Mode Configurations
############################################################################

# Set how the filesystem should be setup
#
# "hdfs" - Normal straight up HDFS if you have local disk in your
#          cluster.  This option is primarily for benchmarking and
#          caching, but probably shouldn't be used in the general case.
#
#          Be careful running this in a cluster environment.  The next
#          time you execute your job, if a different set of nodes are
#          allocated to you, the HDFS data you wrote from a previous
#          job may not be there.  Specifying specific nodes to use in
#          your job submission (e.g. --nodelist in sbatch) may be a
#          way to alleviate this.
#
#          User must set HADOOP_HDFS_PATH below.
#
# "hdfsoverlustre" - HDFS over Lustre.  See README for description.
#
#                    User must set HADOOP_HDFSOVERLUSTRE_PATH below.
#
# "hdfsovernetworkfs" - HDFS over Network FS.  Identical to HDFS over
#                       Lustre, but filesystem agnostic.
#
#                       User must set HADOOP_HDFSOVERNETWORKFS_PATH below.
#
# "rawnetworkfs" - Use Hadoop RawLocalFileSystem (i.e. file: scheme),
#           to use networked file system directly.  It could be a
#           Lustre mount or NFS mount.  Whatever you please.
#
#           User must set HADOOP_RAWNETWORKFS_PATH below.
#
export HADOOP_FILESYSTEM_MODE="hdfsoverlustre"

# Local Filesystem BlockSize
#
# This configuration is the blocksize hadoop will use when doing I/O
# to a local filesystem.  It is used by HDFS when reading from the
# underlying filesystem.  It is also used with
# HADOOP_FILESYSTEM_MODE="rawnetworkfs".
#
# Commonly 33554432, 67108864, 134217728 (i.e. 32m, 64m, 128m)
#
# If not specified, defaults to 33554432
#
# export HADOOP_LOCAL_FILESYSTEM_BLOCKSIZE=33554432

# HDFS Replication
#
# This is used with HADOOP_FILESYSTEM_MODE="hdfs", "hdfsoverlustre",
# and "hdfsovernetworkfs"
#
# HDFS commonly uses 3.  When doing HDFS over Lustre/NetworkFS, higher
# replication can also help with resilience if nodes fail.  You may
# wish to set this to < 3 to save space.
#
# If not specified, defaults to 3
#
# export HADOOP_HDFS_REPLICATION=3

# HDFS Block Size
#
# This is used with HADOOP_FILESYSTEM_MODE="hdfs", "hdfsoverlustre",
# and "hdfsovernetworkfs"
#
# Commonly 134217728, 268435456, 536870912 (i.e. 128m, 256m, 512m)
#
# If not specified, defaults to 134217728
#
# export HADOOP_HDFS_BLOCKSIZE=134217728

# Path for HDFS when using local disk
#
# This is used with HADOOP_FILESYSTEM_MODE="hdfs"
#
# If you want to specify multiple paths (such as multiple drives),
# make them comma separated (e.g. /dir1,/dir2,/dir3).  The multiple
# paths will be used for local intermediate data and HDFS.  The first
# path will also store daemon data, such as namenode or jobtracker
# data.
#
export HADOOP_HDFS_PATH="/ssd/${USER}/hdfs"

# HDFS cleanup
#
# This is used with HADOOP_FILESYSTEM_MODE="hdfs"
#
# After your job has completed, if HADOOP_HDFS_PATH_CLEAR is set to
# yes, Magpie will do a rm -rf on HADOOP_HDFS_PATH.  
#
# This is particularly useful when doing normal HDFS on local storage.
# On your next job run, you may not be able to get the nodes you want
# on your next run.  So you may want to clean up your work before the
# next user uses the node.
#
# export HADOOP_HDFS_PATH_CLEAR="yes"

# Lustre path to do Hadoop HDFS out of
#
# This is used with HADOOP_FILESYSTEM_MODE="hdfsoverlustre"
#
# Note that different versions of Hadoop may not be compatible with
# your current HDFS data.  If you're going to switch around to
# different versions, perhaps set different paths for different data.
#
export HADOOP_HDFSOVERLUSTRE_PATH="/lustre/${USER}/hdfsoverlustre/"

# HDFS over Lustre ignore lock
#
# This is used with HADOOP_FILESYSTEM_MODE="hdfsoverlustre"
#
# Cleanup in_use.lock files before launching HDFS
#
# On traditional Hadoop clusters, the in_use.lock file protects
# against a second HDFS daemon running on the same node.  The lock
# file can similarly protect against a second HDFS daemon running on
# another node of your cluster (which is not desired, as both
# namenodes could change namenode data at the same time).
#
# However, sometimes the lock file may be there due to a prior job
# that failed and locks were not cleaned up on teardown.  This may
# prohibit new HDFS daemons from running correctly.
#
# By default, if this option is not set, the lock file will be left in
# place and may cause HDFS daemons to not start.  If set to yes, the
# lock files will be removed before starting HDFS.
#
# export HADOOP_HDFSOVERLUSTRE_REMOVE_LOCKS=yes

# Networkfs path to do Hadoop HDFS out of
#
# This is used with HADOOP_FILESYSTEM_MODE="hdfsovernetworkfs"
#
# Note that different versions of Hadoop may not be compatible with
# your current HDFS data.  If you're going to switch around to
# different versions, perhaps set different paths for different data.
#
export HADOOP_HDFSOVERNETWORKFS_PATH="/networkfs/${USER}/hdfsovernetworkfs/"

# HDFS over Networkfs ignore lock
#
# This is used with HADOOP_FILESYSTEM_MODE="hdfsovernetworkfs"
#
# Cleanup in_use.lock files before launching HDFS
#
# On traditional Hadoop clusters, the in_use.lock file protects
# against a second HDFS daemon running on the same node.  The lock
# file can similarly protect against a second HDFS daemon running on
# another node of your cluster (which is not desired, as both
# namenodes could change namenode data at the same time).
#
# However, sometimes the lock file may be there due to a prior job
# that failed and locks were not cleaned up on teardown.  This may
# prohibit new HDFS daemons from running correctly.
#
# By default, if this option is not set, the lock file will be left in
# place and may cause HDFS daemons to not start.  If set to yes, the
# lock files will be removed before starting HDFS.
#
# export HADOOP_HDFSOVERNETWORKFS_REMOVE_LOCKS=yes

# Path for rawnetworkfs
#
# This is used with HADOOP_FILESYSTEM_MODE="rawnetworkfs"
#
export HADOOP_RAWNETWORKFS_PATH="/lustre/${USER}/rawnetworkfs/"

# If you have a local SSD or NVRAM, performance may be better to store
# intermediate data on it rather than Lustre or some other networked
# filesystem.  If the below environment variable is specified, local
# intermediate data will be stored in the specified directory.
# Otherwise it will go to an appropriate directory in Lustre/networked
# FS.
#
# Be wary, local SSDs/NVRAM stores may have less space than HDDs or
# networked file systems.  It can be easy to run out of space.
#
# If you want to specify multiple paths (such as multiple drives),
# make them comma separated (e.g. /dir1,/dir2,/dir3).  The multiple
# paths will be used for local intermediate data.
#
# export HADOOP_LOCALSTORE="/ssd/${USER}/localstore/"

# HADOOP_LOCALSTORE_CLEAR 
#
# After your job has completed, if HADOOP_LOCALSTORE_CLEAR is set to
# yes, Magpie will do a rm -rf on all directories in
# HADOOP_LOCALSTORE.  This is particularly useful if the localstore
# directory is on local storage and you want to clean up your work
# before the next user uses the node.
#
# export HADOOP_LOCALSTORE_CLEAR="yes"

# Option to use unique locations per job to store hdfs data
#
# If this is set to yes the nodes will append the job id to the
# current HDFSOVERLUSTRE and HDFSOVERNETWORKFS path thus keeping the
# hdfs data isolated per job. This enables the same script to be
# executed multiple times (usually with different data) without the
# HDFSOVERXXX instances colliding with each other
#
# Be careful to cleanup the HDFSOVERXXX directories from time to time,
# as Magpie will not clear data from prior jobs.
#
# export HADOOP_PER_JOB_HDFS_PATH="yes"

############################################################################
# Hadoop Terasort Configurations
############################################################################

# Terasort size
#
# For "terasort" mode.
#
# Specify terasort size in units of 100.  Specify 10000000000 for
# terabyte, for actual benchmarking
#
# Specify something small, for basic sanity tests.
#
# Defaults to 50000000.
#
# export HADOOP_TERASORT_SIZE=50000000

# Terasort map count
#
# For "terasort" mode during the teragen of data.
#
# If not specified, will be computed to a reasonable number given
# HADOOP_TERASORT_SIZE and the block size of the the filesyste you are
# using (e.g. for HDFS the HADOOP_HDFS_BLOCKSIZE)
#
# export HADOOP_TERAGEN_MAP_COUNT=4

# Terasort reducer count
#
# For "terasort" mode during the actual terasort of data.
#
# If not specified, will be compute node count * 2.
#
# export HADOOP_TERASORT_REDUCER_COUNT=4

# Terasort cache
#
# For "real benchmarking" you should flush page cache between a
# teragen and a terasort.  You can disable this for sanity runs/tests
# to make things go faster.  Specify yes or no.  Defaults to yes.
#
# export HADOOP_TERASORT_CLEAR_CACHE=no

# Terasort output replication count
#
# For "terasort" mode during the actual terasort of data
#
# In some circumstances, replication of the output from the terasort
# must be equal to the replication of data for the input.  In other
# cases it can be less.  The below can be adjusted to tweak for
# benchmarking purposes.
#
# If not specified, defaults to Terasort default, which is 1 in most
# versions of Hadoop
#
# export HADOOP_TERASORT_OUTPUT_REPLICATION=1

# Terachecksum
#
# For "terasort" mode after the teragen of data
#
# After executing the teragen, run terachecksum to calculate a checksum of
# the input.
#
# If both this and HADOOP_TERASORT_RUN_TERAVALIDATE are set, the
# checksums will be compared afterwards for equality.
#
# Defaults to no
#
# export HADOOP_TERASORT_RUN_TERACHECKSUM=no

# Teravalidate
#
# For "terasort" mode after the actual terasort of data
#
# After executing the sort, run teravalidate to validate the sorted data.
#
# If both this and HADOOP_TERASORT_RUN_TERACHECKSUM are set, the
# checksums will be compared afterwards for equality.
#
# Defaults to no
#
# export HADOOP_TERASORT_RUN_TERAVALIDATE=no

############################################################################
# Hadoop Script Configurations
############################################################################

# Specify script to execute for "script" mode
#
# See examples/hadoop-example-job-script for example of what to put in
# the script.
#
# export HADOOP_SCRIPT_PATH="${HOME}/my-job-script"

# Specify arguments for script specified in HADOOP_SCRIPT_PATH
#
# Note that many Magpie generated environment variables, such as
# HADOOP_MASTER_NODE, are not generated until the job has launched.
# You won't be able to use them here.
#
# export HADOOP_SCRIPT_ARGS="" 

############################################################################
# Hadoop Decommission HDFS Nodes Configurations
############################################################################

# Specify decommission node size for "decommissionhdfsnodes" mode
#
# For example, if your current HDFS node size is 16, your job size is
# likely 17 nodes (including the master).  If you wish to decommission
# to 8 data nodes (job size of 9 nodes total), set this to 8.
#
# export HADOOP_DECOMMISSION_HDFS_NODE_SIZE=8

############################################################################
# Run Job
############################################################################

srun --no-kill -W 0 $MAGPIE_SCRIPTS_HOME/magpie-check-inputs
if [ $? -ne 0 ]
then
    exit 1
fi
srun --no-kill -W 0 $MAGPIE_SCRIPTS_HOME/magpie-setup-core
if [ $? -ne 0 ]
then
    exit 1
fi
srun --no-kill -W 0 $MAGPIE_SCRIPTS_HOME/magpie-setup-projects
if [ $? -ne 0 ]
then
    exit 1
fi
srun --no-kill -W 0 $MAGPIE_SCRIPTS_HOME/magpie-setup-post
if [ $? -ne 0 ]
then
    exit 1
fi
srun --no-kill -W 0 $MAGPIE_SCRIPTS_HOME/magpie-pre-run
if [ $? -ne 0 ]
then
    exit 1
fi

srun --no-kill -W 0 $MAGPIE_SCRIPTS_HOME/magpie-run
srun --no-kill -W 0 $MAGPIE_SCRIPTS_HOME/magpie-cleanup
srun --no-kill -W 0 $MAGPIE_SCRIPTS_HOME/magpie-post-run

``
chu11 commented 6 years ago

when i execute the job in a slurm cluster i get no output.

You should minimally get some error output. I can't speak to your slurm setup, but as an experiment, I'd stick some "echos" in your sbatch file to make sure you're getting stdout properly. Like stick some echos before

srun --no-kill -W 0 $MAGPIE_SCRIPTS_HOME/magpie-check-inputs

and make sure you get them.

grod-uy commented 6 years ago

Thanks Albert, I already added some echo's but i had no luck, the script run to the end but the output is empty.

chu11 commented 6 years ago

There's clearly something up with your slurm setup. As an experiment, delete:

SBATCH --output="slurm-%j.out"

and just specify an output file via --ouput to sbatch.

If that doesn't work, you'll probably have to contact slurm people for support. I'm not sure why your slurm isn't outputting anything.