gluster / glusterfs-hadoop

GlusterFS plugin for Hadoop HCFS
Apache License 2.0
69 stars 38 forks source link

Bug with the YARN streaming interface #82

Open bcornec opened 10 years ago

bcornec commented 10 years ago

Using version 2.1.6 of the glusterfs-hadoop plugin in an hadoop 2.x and glusterfs 3.4 environment, we have errors with the YARN streaming interface.

SW used: glusterfs-libs-3.4.0.59rhs-1.el6rhs.x86_64 glusterfs-fuse-3.4.0.59rhs-1.el6rhs.x86_64 glusterfs-3.4.0.59rhs-1.el6rhs.x86_64 glusterfs-server-3.4.0.59rhs-1.el6rhs.x86_64

RHEL 6.4 with kernel 2.6.32-358.32.3.el6.x86_64 glusterfs-hadoop-2.1.6.jar

Run results:

-bash-4.1$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -mapper /bin/cat -input /ls-gfs.txt -output /process/ 14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Initializing gluster volume.. 14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS 14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled. 14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=7b04317, git.commit.user.email=jayunit100@gmail.com, git.commit.message.full=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers

include the sudoers file in the srpm, git.commit.id=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.message.short=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers, git.commit.user.name=jay vyas, git.build.user.name=Unknown, git.commit.id.describe=2.1.6, git.build.user.email=Unknown, git.branch=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.time=07.02.2014 @ 12:06:31 EST, git.build.time=10.02.2014 @ 13:31:20 EST} 14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.1.6 14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS 14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Initializing gluster volume.. 14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata 14/02/28 15:05:09 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root 14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn 14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Write buffer size : 131072 packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.0.4-Intel.jar] /tmp/streamjob2645998574693064427.jar tmpDir=null 14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. 14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume.. 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume.. 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Write buffer size : 131072 14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. 14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume.. 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume.. 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Write buffer size : 131072 14/02/28 15:05:10 INFO mapred.FileInputFormat: Total input paths to process : 1 14/02/28 15:05:10 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/yarn/.staging/job_1393593232248_0003 Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.mapred.FileInputFormat.getSplitHosts(FileInputFormat.java:508) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:298) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:503) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:495) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:390) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1234) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1231) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1231) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:589) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:584) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:584) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:575) at org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:1014) at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:134) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

wattsteve commented 10 years ago

Your hadoop processes should be started under user yarn not root. Since this issue is with the Intel Hadoop Distribution you need to follow specific instructions to get it to properly run. The instructions are available here - https://access.redhat.com/site/articles/730763/