Closed creggian closed 12 years ago
I have other information. I gave a look at R/mapreduce.R in the rmr2 package, and there are lots of functions that begin with dfs.<...> but from R console I can use dfs.empty and dfs.size only.
So I defined dfs.tempfile in the console, and I use it:
> dfs.tempfile()
function() {fname}
<environment: 0x8e71a5c>
and that's the error above
EDIT: to install rmr2, i run (recalling) $ sudo R CMD build rmr2...tar.gz $ sudo R CMD INSTALL < dir >/pkg
This seems to have worked just fine to me. If you want to see the numbers you can wrap the map reduce call in a from.dfs call
Sent from a phone On Oct 18, 2012 2:10 AM, "Claudio Reggiani" notifications@github.com wrote:
I'm following the rmr2 tutorial.
small.ints = to.dfs(1:1000) mapreduce(input = small.ints, map = function(k,v) cbind(v,v^2))
and that is the log I have
library(rmr2) Loading required package: Rcpp Loading required package: RJSONIO Loading required package: itertools Loading required package: iterators Loading required package: digest Loading required package: functional small.ints = to.dfs(1:1000) 12/10/18 07:42:50 INFO util.NativeCodeLoader: Loaded the native-hadoop library 12/10/18 07:42:50 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 12/10/18 07:42:50 INFO compress.CodecPool: Got brand-new compressor mapreduce(input = small.ints, map = function(k,v) cbind(v,v^2)) packageJobJar: [/tmp/RtmpmulW54/rmr-local-envff576eb9847, /tmp/RtmpmulW54/rmr-global-envff552d9f066, /tmp/RtmpmulW54/rmr-streaming-mapff53d4f7035, /tmp/hadoop-cloudera/hadoop-unjar8328009872758481942/] [] /tmp/streamjob3885039090765379535.jar tmpDir=null 12/10/18 07:43:02 INFO mapred.FileInputFormat: Total input paths to process : 1 12/10/18 07:43:03 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-cloudera/mapred/local] 12/10/18 07:43:03 INFO streaming.StreamJob: Running job: job_201210180740_0001 12/10/18 07:43:03 INFO streaming.StreamJob: To kill this job, run: 12/10/18 07:43:03 INFO streaming.StreamJob: /usr/lib/hadoop-0.20/bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201210180740_0001 12/10/18 07:43:03 INFO streaming.StreamJob: Tracking URL: http://localhost.localdomain:50030/jobdetails.jsp?jobid=job_201210180740_0001 12/10/18 http://localhost.localdomain:50030/jobdetails.jsp?jobid=job_201210180740_000112/10/18 07:43:04 INFO streaming.StreamJob: map 0% reduce 0% 12/10/18 07:43:17 INFO streaming.StreamJob: map 100% reduce 0% 12/10/18 07:43:20 INFO streaming.StreamJob: map 100% reduce 100% 12/10/18 07:43:20 INFO streaming.StreamJob: Job complete: job_201210180740_0001 12/10/18 07:43:20 INFO streaming.StreamJob: Output: /tmp/RtmpmulW54/fileff56d81a980 function () { fname } <environment: 0x8d52498>
Everything works fine but it seems it isn't able to complete the output. I don't have enough information to where put my hands on.
Here it is the userlogs
[cloudera@localhost job_201210180740_0001]$ cat attempt_201210180740_0001_m_000000_0/stderr Loading required package: Rcpp Loading required package: RJSONIO Loading required package: methods Loading required package: itertools Loading required package: iterators Loading required package: digest Loading required package: functional [cloudera@localhost job_201210180740_0001]$ cat attempt_201210180740_0001_m_000001_0/stderr Loading required package: Rcpp Loading required package: RJSONIO Loading required package: methods Loading required package: itertools Loading required package: iterators Loading required package: digest Loading required package: functional [cloudera@localhost job_201210180740_0001]$ cat attempt_201210180740_0001_m_000002_0/stderr [cloudera@localhost job_201210180740_0001]$ cat attempt_201210180740_0001_m_000003_0/stderr [cloudera@localhost job_201210180740_0001]$ cat attempt_201210180740_0001_m_00000 attempt_201210180740_0001_m_000000_0/ attempt_201210180740_0001_m_000001_0/ attempt_201210180740_0001_m_000002_0/ attempt_201210180740_0001_m_000003_0/ [cloudera@localhost job_201210180740_0001]$ cat attempt_201210180740_0001_m_000001_0/ log.index stderr stdout syslog [cloudera@localhost job_201210180740_0001]$ cat attempt_201210180740_0001_m_000001_0/log.index LOG_DIR:/usr/lib/hadoop-0.20/bin/../logs/userlogs/job_201210180740_0001/attempt_201210180740_0001_m_000001_0 stdout:0 -1 stderr:0 -1 syslog:0 -1 [cloudera@localhost job_201210180740_0001]$ cat attempt_201210180740_0001_m_000001_0/stdout [cloudera@localhost job_201210180740_0001]$ cat attempt_201210180740_0001_m_000001_0/syslog 2012-10-18 07:43:08,699 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library 2012-10-18 07:43:08,901 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-cloudera/mapred/local/taskTracker/cloudera/jobcache/job_201210180740_0001/jars/job.jar <- /tmp/hadoop-cloudera/mapred/local/taskTracker/cloudera/jobcache/job_201210180740_0001/attempt_201210180740_0001_m_000001_0/work/job.jar 2012-10-18 07:43:08,917 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-cloudera/mapred/local/taskTracker/cloudera/jobcache/job_201210180740_0001/jars/.job.jar.crc <- /tmp/hadoop-cloudera/mapred/local/taskTracker/cloudera/jobcache/job_201210180740_0001/attempt_201210180740_0001_m_000001_0/work/.job.jar.crc 2012-10-18 07:43:08,927 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-cloudera/mapred/local/taskTracker/cloudera/jobcache/job_201210180740_0001/jars/rmr-local-envff576eb9847 <- /tmp/hadoop-cloudera/mapred/local/taskTracker/cloudera/jobcache/job_201210180740_0001/attempt_201210180740_0001_m_000001_0/work/rmr-local-envff576eb9847 2012-10-18 07:43:08,937 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-cloudera/mapred/local/taskTracker/cloudera/jobcache/job_201210180740_0001/jars/rmr-streaming-mapff53d4f7035 <- /tmp/hadoop-cloudera/mapred/local/taskTracker/cloudera/jobcache/job_201210180740_0001/attempt_201210180740_0001_m_000001_0/work/rmr-streaming-mapff53d4f7035 2012-10-18 07:43:08,946 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-cloudera/mapred/local/taskTracker/cloudera/jobcache/job_201210180740_0001/jars/rmr-global-envff552d9f066 <- /tmp/hadoop-cloudera/mapred/local/taskTracker/cloudera/jobcache/job_201210180740_0001/attempt_201210180740_0001_m_000001_0/work/rmr-global-envff552d9f066 2012-10-18 07:43:09,031 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2012-10-18 07:43:09,428 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0 2012-10-18 07:43:09,466 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@73a7ab 2012-10-18 07:43:09,782 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 2012-10-18 07:43:09,785 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor 2012-10-18 07:43:09,810 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0 2012-10-18 07:43:10,137 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2012-10-18 07:43:10,200 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/usr/bin/Rscript, rmr-streaming-mapff53d4f7035] 2012-10-18 07:43:14,687 INFO org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done 2012-10-18 07:43:14,704 INFO org.apache.hadoop.streaming.PipeMapRed: mapRedFinished 2012-10-18 07:43:15,114 INFO org.apache.hadoop.mapred.Task: Task:attempt_201210180740_0001_m_000001_0 is done. And is in the process of commiting 2012-10-18 07:43:16,285 INFO org.apache.hadoop.mapred.Task: Task attempt_201210180740_0001_m_000001_0 is allowed to commit now 2012-10-18 07:43:16,333 INFO org.apache.hadoop.mapred.FileOutputCommitter: Saved output of task 'attempt_201210180740_0001_m_000001_0' to hdfs://localhost:9000/tmp/RtmpmulW54/fileff56d81a980 2012-10-18 07:43:16,364 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201210180740_0001_m_000001_0' done. 2012-10-18 07:43:16,372 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
That is the environment I'm working on:
- CentOS 5.8
- $ java -version java version "1.6.0_22" OpenJDK Runtime Environment (IcedTea6 1.10.4) (rhel-1.24.1.10.4.el5-i386) OpenJDK Client VM (build 20.0-b11, mixed mode)
- $ hadoop version Hadoop 0.20.2-cdh3u5 Subversion file:///data/1/tmp/topdir/BUILD/hadoop-0.20.2-cdh3u5 -r de14a95https://github.com/RevolutionAnalytics/RHadoop/commit/de14a95e895a72e7b2501bbe628c1e23578aae29Compiled by root on Wed Aug 22 14:57:44 PDT 2012 From source with checksum 32e743fc1528087177062231df2d5171
- R version 2.15.1 (2012-06-22)
rmr2 version 2.0.0
Reply to this email directly or view it on GitHubhttps://github.com/RevolutionAnalytics/RHadoop/issues/141.
I'm following the rmr2 tutorial.
and that is the log I have
Everything works fine but it seems it isn't able to complete the output. I don't have enough information to where put my hands on.
Here it is the userlogs
That is the environment I'm working on: