i use rmr2 and hadoop as a newbie on windows 7 with hadoop 2.7.1 and R 3.3.2
when i run "to.dfs" it runs
and "mapreduce" return no error
but "from.dfs" return NULL
as if nothing is computed on nodes
i just noticed that in hadoop storage directory
i join below the trace of log4j i found in an hadoop dfs directory
( path: D:\tmp\hadoop-turenne\dfs\data\current\BP-1122533003-192.168.0.28-1510766561951\current\finalized\subdir0\subdir0\blk_1073741894 )
ints = to.dfs(1:10)
17/11/15 18:23:32 WARN zlib.ZlibFactory: Failed to load/initialize native-zlib library
17/11/15 18:23:32 INFO compress.CodecPool: Got brand-new compressor [.deflate]
calc = mapreduce(input = ints, map = function(k, v) cbind(v, 2*v))
packageJobJar: [/D:/Soft/hadoop/hadoop-unjar2499078311805680846/] [] D:\Soft\hadoop\streamjob9034814976255524682.jar tmpDir=null
17/11/15 18:23:37 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/11/15 18:23:37 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/11/15 18:23:38 INFO mapred.FileInputFormat: Total input paths to process : 1
17/11/15 18:23:38 INFO mapreduce.JobSubmitter: number of splits:2
17/11/15 18:23:38 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
17/11/15 18:23:39 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1510766571013_0001
17/11/15 18:23:39 INFO impl.YarnClientImpl: Submitted application application_1510766571013_0001
17/11/15 18:23:39 INFO mapreduce.Job: The url to track the job: http://portlisis03:8088/proxy/application_1510766571013_0001/
17/11/15 18:23:39 INFO mapreduce.Job: Running job: job_1510766571013_0001
17/11/15 18:23:52 INFO mapreduce.Job: Job job_1510766571013_0001 running in uber mode : false
17/11/15 18:23:52 INFO mapreduce.Job: map 0% reduce 0%
17/11/15 18:24:02 INFO mapreduce.Job: map 100% reduce 0%
17/11/15 18:24:03 INFO mapreduce.Job: Job job_1510766571013_0001 completed successfully
17/11/15 18:24:04 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=244824
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=985
HDFS: Number of bytes written=244
HDFS: Number of read operations=14
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Job Counters
Launched map tasks=2
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=16185
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=16185
Total vcore-seconds taken by all map tasks=16185
Total megabyte-seconds taken by all map tasks=16573440
Map-Reduce Framework
Map input records=3
Map output records=0
Input split bytes=192
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=214
CPU time spent (ms)=2604
Physical memory (bytes) snapshot=352190464
Virtual memory (bytes) snapshot=570482688
Total committed heap usage (bytes)=266862592
File Input Format Counters
Bytes Read=793
File Output Format Counters
Bytes Written=244
17/11/15 18:24:04 INFO streaming.StreamJob: Output directory: /Corpus/file2a107f9f2e58
from.dfs(calc)
$key
NULL
$val
NULL
sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Hi !
i use rmr2 and hadoop as a newbie on windows 7 with hadoop 2.7.1 and R 3.3.2
when i run "to.dfs" it runs and "mapreduce" return no error but "from.dfs" return NULL as if nothing is computed on nodes
i just noticed that in hadoop storage directory
i join below the trace of log4j i found in an hadoop dfs directory ( path: D:\tmp\hadoop-turenne\dfs\data\current\BP-1122533003-192.168.0.28-1510766561951\current\finalized\subdir0\subdir0\blk_1073741894 )
thank you for help nico
This are my parameters :
init env R
Sys.setenv(HADOOP_CMD="D:/Soft/hadoop/bin/hadoop") Sys.setenv(HADOOP_HOME="D:/Soft/hadoop/") Sys.setenv("HADOOP_PREFIX"="D:/Soft/hadoop/") Sys.setenv(HADOOP_STREAMING="D:/Soft/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar")
Sys.setenv(TMP = 'D:\Soft\hadoop') library(rhdfs) library(rmr2) library(ravro) .jinit() hdfs.init()
$val NULL
locale: [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252
[3] LC_MONETARY=French_France.1252 LC_NUMERIC=C
[5] LC_TIME=French_France.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] ravro_1.0.4 bit64_0.9-5 bit_1.1-12 rmr2_3.3.0 rhdfs_1.0.8 rJava_0.9-8
loaded via a namespace (and not attached): [1] Rcpp_0.12.8 digest_0.6.11 bitops_1.0-6 plyr_1.8.4 magrittr_1.5
[6] stringi_1.1.2 reshape2_1.4.2 functional_0.6 rjson_0.2.15 RJSONIO_1.3-0 [11] tools_3.3.2 stringr_1.1.0 caTools_1.17.1
TRACE of file in hadoop
log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
log.txt log.txt