RevolutionAnalytics / rmr2

A package that allows R developer to use Hadoop MapReduce
160 stars 149 forks source link

rmr2 example program #56

Closed raghavi222 closed 11 years ago

raghavi222 commented 11 years ago

screenshot hello sir.. i'm new to R.. i've installed rmr2 package .. but i cud not run the following sample program .. Sys.setenv(HADOOP_HOME="/usr/lib/hadoop") Sys.setenv(HADOOP_CMD="/usr/lib/hadoop/bin/hadoop") library(rmr2) library(rhdfs) small.ints = 1:1000 small.int.path = to.dfs(1:1000) out = mapreduce(input = small.int.path, map = function(k,v) keyval(v, v^2) ) df = as.data.frame( from.dfs( out, structured=T ) )

i've posted the screen shot of my error. i'm using jdk 1.6

HADOOP_HOME=/usr/lib/hadoop HADOOP_CMD=/usr/lib/hadoop/bin/hadoop HADOOP_STREAMING=/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.2.0.jar JAVA_HOME=/usr/java/jdk1.7.0_21 i've set the above environment variables in Renviron and .bashrc file.. i'm using cdh4 hadoop..

piccolbo commented 11 years ago

Please see the debugging guide in the RHadoop wiki. Unfortunately if you run your first program in distributed mode and just send me the console output, there isn't a lot for me to work on there. A screenshot has the only advantage of being harder to read, I hope people stop doing that. There's nothing I can do about it, there are multiple logs that are local to the machines that are executing the different processes. That's the way Hadoop is and there's probably good reasons why but whether or not, we need to work with it. Which means you either switch to standalone mode (see Cloudera documentation for how to do that) or go and fetch the stderr logs from the web UI. And while in this case there is nothing wrong with your program (works for me), these are skills that you need anyway to debug your own programs, so learning your way around the different modes and logs is a necessary investment.

raghavi222 commented 11 years ago

hey thanks.. i could resolve my error... i've one more dout.. i've installed flume .. entered all my twitter credentials into flume.conf file my flume.conf file is

TwitterAgent.sources = Twitter TwitterAgent.channels = MemChannel TwitterAgent.sinks = HDFS

TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource TwitterAgent.sources.Twitter.channels = MemChannel TwitterAgent.sources.Twitter.consumerKey = TwitterAgent.sources.Twitter.consumerSecret = TwitterAgent.sources.Twitter.accessToken = TwitterAgent.sources.Twitter.accessTokenSecret = TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, cloudera, data science, data scientiest, business intelligence, mapreduce, data warehouse, data warehousing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing

TwitterAgent.sinks.HDFS.channel = MemChannel TwitterAgent.sinks.HDFS.type = hdfs TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:50070/user/flume/tweets/%Y/%m/%d/%H/ TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000 TwitterAgent.sinks.HDFS.hdfs.rollSize = 0 TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000

TwitterAgent.channels.MemChannel.type = memory TwitterAgent.channels.MemChannel.capacity = 10000 TwitterAgent.channels.MemChannel.transactionCapacity = 100

and i have entered FLUME_CLASSPATH="/usr/lib/hadoop/flume-sources-1.0-SNAPSHOT.jar" in flume-env.sh

when i try to start flume by /etc/init.d/flume-ng-agent start command i'm not recieving any tweets in my hdfs..

my flume.log file is

ERROR lifecycleSupervisor-1-6 - Unable to start EventDrivenSourceRunner: { source:com.cloudera.flume.source.TwitterSource{name:Twitter,state:IDLE} } - Exception follows. java.lang.IllegalStateException: consumer key/secret pair already set. at twitter4j.TwitterBaseImpl.setOAuthConsumer(TwitterBaseImpl.java:261) at com.cloudera.flume.source.TwitterSource.start(TwitterSource.java:129) at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) 13 Jul 2013 08:42:24,127 INFO agent-shutdown-hook - Stopping lifecycle supervisor 9 13 Jul 2013 08:42:24,130 INFO agent-shutdown-hook - Component type: SINK, name: HDFS stopped 13 Jul 2013 08:42:24,130 INFO agent-shutdown-hook - Shutdown Metric for type: SINK, name: HDFS. sink.start.time == 1373729312168 13 Jul 2013 08:42:24,130 INFO agent-shutdown-hook - Shutdown Metric for type: SINK, name: HDFS. sink.stop.time == 1373730144130 13 Jul 2013 08:42:24,130 INFO agent-shutdown-hook - Shutdown Metric for type: SINK, name: HDFS. sink.batch.complete == 0 13 Jul 2013 08:42:24,131 INFO agent-shutdown-hook - Shutdown Metric for type: SINK, name: HDFS. sink.batch.empty == 106 13 Jul 2013 08:42:24,131 INFO agent-shutdown-hook - Shutdown Metric for type: SINK, name: HDFS. sink.batch.underflow == 0 13 Jul 2013 08:42:24,131 INFO agent-shutdown-hook - Shutdown Metric for type: SINK, name: HDFS. sink.connection.closed.count == 0 13 Jul 2013 08:42:24,131 INFO agent-shutdown-hook - Shutdown Metric for type: SINK, name: HDFS. sink.connection.creation.count == 0 13 Jul 2013 08:42:24,131 INFO agent-shutdown-hook - Shutdown Metric for type: SINK, name: HDFS. sink.connection.failed.count == 0 13 Jul 2013 08:42:24,131 INFO agent-shutdown-hook - Shutdown Metric for type: SINK, name: HDFS. sink.event.drain.attempt == 0 13 Jul 2013 08:42:24,131 INFO agent-shutdown-hook - Shutdown Metric for type: SINK, name: HDFS. sink.event.drain.sucess == 0 13 Jul 2013 08:42:24,131 INFO agent-shutdown-hook - Component type: CHANNEL, name: MemChannel stopped 13 Jul 2013 08:42:24,131 INFO agent-shutdown-hook - Shutdown Metric for type: CHANNEL, name: MemChannel. channel.start.time == 1373729312155 13 Jul 2013 08:42:24,131 INFO agent-shutdown-hook - Shutdown Metric for type: CHANNEL, name: MemChannel. channel.stop.time == 1373730144131 13 Jul 2013 08:42:24,131 INFO agent-shutdown-hook - Shutdown Metric for type: CHANNEL, name: MemChannel. channel.capacity == 10000 13 Jul 2013 08:42:24,131 INFO agent-shutdown-hook - Shutdown Metric for type: CHANNEL, name: MemChannel. channel.current.size == 0 13 Jul 2013 08:42:24,131 INFO agent-shutdown-hook - Shutdown Metric for type: CHANNEL, name: MemChannel. channel.event.put.attempt == 0 13 Jul 2013 08:42:24,132 INFO agent-shutdown-hook - Shutdown Metric for type: CHANNEL, name: MemChannel. channel.event.put.success == 0 13 Jul 2013 08:42:24,132 INFO agent-shutdown-hook - Shutdown Metric for type: CHANNEL, name: MemChannel. channel.event.take.attempt == 106 13 Jul 2013 08:42:24,132 INFO agent-shutdown-hook - Shutdown Metric for type: CHANNEL, name: MemChannel. channel.event.take.success == 0 13 Jul 2013 08:42:24,132 INFO agent-shutdown-hook - Configuration provider stopping 13 Jul 2013 08:42:29,038 INFO lifecycleSupervisor-1-0 - Configuration provider starting 13 Jul 2013 08:42:29,049 INFO conf-file-poller-0 - Reloading configuration file:/etc/flume-ng/conf/flume.conf 13 Jul 2013 08:42:29,057 INFO conf-file-poller-0 - Processing:HDFS 13 Jul 2013 08:42:29,058 INFO conf-file-poller-0 - Processing:HDFS 13 Jul 2013 08:42:29,058 INFO conf-file-poller-0 - Processing:HDFS 13 Jul 2013 08:42:29,058 INFO conf-file-poller-0 - Added sinks: HDFS Agent: TwitterAgent 13 Jul 2013 08:42:29,058 INFO conf-file-poller-0 - Processing:HDFS 13 Jul 2013 08:42:29,059 INFO conf-file-poller-0 - Processing:HDFS 13 Jul 2013 08:42:29,059 INFO conf-file-poller-0 - Processing:HDFS 13 Jul 2013 08:42:29,059 INFO conf-file-poller-0 - Processing:HDFS 13 Jul 2013 08:42:29,059 INFO conf-file-poller-0 - Processing:HDFS 13 Jul 2013 08:42:29,080 INFO conf-file-poller-0 - Post-validation flume configuration contains configuration for agents: [TwitterAgent] 13 Jul 2013 08:42:29,080 INFO conf-file-poller-0 - Creating channels 13 Jul 2013 08:42:29,082 ERROR conf-file-poller-0 - Unhandled error java.lang.NoSuchMethodError: org.apache.flume.ChannelFactory.getClass(Ljava/lang/String;)Ljava/lang/Class; at org.apache.flume.node.AbstractConfigurationProvider.getOrCreateChannel(AbstractConfigurationProvider.java:236) at org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:199) at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:101) at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722)

plz help me.. thank q

piccolbo commented 11 years ago

please direct your question to the appropriate flume forum or issue tracker.