Open bcornec opened 10 years ago
Your hadoop processes should be started under user yarn not root. Since this issue is with the Intel Hadoop Distribution you need to follow specific instructions to get it to properly run. The instructions are available here - https://access.redhat.com/site/articles/730763/
Using version 2.1.6 of the glusterfs-hadoop plugin in an hadoop 2.x and glusterfs 3.4 environment, we have errors with the YARN streaming interface.
SW used: glusterfs-libs-3.4.0.59rhs-1.el6rhs.x86_64 glusterfs-fuse-3.4.0.59rhs-1.el6rhs.x86_64 glusterfs-3.4.0.59rhs-1.el6rhs.x86_64 glusterfs-server-3.4.0.59rhs-1.el6rhs.x86_64
RHEL 6.4 with kernel 2.6.32-358.32.3.el6.x86_64 glusterfs-hadoop-2.1.6.jar
Run results:
-bash-4.1$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -mapper /bin/cat -input /ls-gfs.txt -output /process/ 14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Initializing gluster volume.. 14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS 14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled. 14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=7b04317, git.commit.user.email=jayunit100@gmail.com, git.commit.message.full=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers
include the sudoers file in the srpm, git.commit.id=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.message.short=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers, git.commit.user.name=jay vyas, git.build.user.name=Unknown, git.commit.id.describe=2.1.6, git.build.user.email=Unknown, git.branch=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.time=07.02.2014 @ 12:06:31 EST, git.build.time=10.02.2014 @ 13:31:20 EST} 14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.1.6 14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS 14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Initializing gluster volume.. 14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata 14/02/28 15:05:09 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root 14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn 14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Write buffer size : 131072 packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.0.4-Intel.jar] /tmp/streamjob2645998574693064427.jar tmpDir=null 14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. 14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume.. 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume.. 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Write buffer size : 131072 14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. 14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume.. 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume.. 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn 14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Write buffer size : 131072 14/02/28 15:05:10 INFO mapred.FileInputFormat: Total input paths to process : 1 14/02/28 15:05:10 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/yarn/.staging/job_1393593232248_0003 Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.mapred.FileInputFormat.getSplitHosts(FileInputFormat.java:508) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:298) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:503) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:495) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:390) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1234) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1231) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1231) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:589) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:584) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:584) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:575) at org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:1014) at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:134) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212)