Using version 2.1.6 of the glusterfs-hadoop plugin in an hadoop 2.x and glusterfs 3.4 environment, we have errors with the YARN streaming interface.

SW used: glusterfs-libs-3.4.0.59rhs-1.el6rhs.x86_64 glusterfs-fuse-3.4.0.59rhs-1.el6rhs.x86_64 glusterfs-3.4.0.59rhs-1.el6rhs.x86_64 glusterfs-server-3.4.0.59rhs-1.el6rhs.x86_64

RHEL 6.4 with kernel 2.6.32-358.32.3.el6.x86_64 glusterfs-hadoop-2.1.6.jar

Run results:

-bash-4.1$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -mapper /bin/cat -input /ls-gfs.txt -output /process/ 14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Initializing gluster volume.. 14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS 14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled. 14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=7b04317, git.commit.user.email=jayunit100@gmail.com, git.commit.message.full=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers

include the sudoers file in the srpm, git.commit.id=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.message.short=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers, git.commit.user.name=jay vyas, git.build.user.name=Unknown, git.commit.id.describe=2.1.6, git.build.user.email=Unknown, git.branch=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.time=07.02.2014 @ 12:06:31 EST, git.build.time=10.02.2014 @ 13:31:20 EST} 14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.1.6 14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS 14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Initializing gluster volume.. 14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata 14/02/28 15:05:09 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root 14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn 14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Write buffer size : 131072 packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.0.4-Intel.jar] /tmp/streamjob2645998574693064427.jar tmpDir=null 14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. 14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume.. 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume.. 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Write buffer size : 131072 14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. 14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume.. 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume.. 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Write buffer size : 131072 14/02/28 15:05:10 INFO mapred.FileInputFormat: Total input paths to process : 1 14/02/28 15:05:10 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/yarn/.staging/job_1393593232248_0003 Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.mapred.FileInputFormat.getSplitHosts(FileInputFormat.java:508) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:298) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:503) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:495) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:390) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1234) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1231) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1231) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:589) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:584) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:584) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:575) at org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:1014) at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:134) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

gluster / glusterfs-hadoop

Bug with the YARN streaming interface #82

Run results: