RevolutionAnalytics / rmr2

A package that allows R developer to use Hadoop MapReduce
160 stars 149 forks source link

from.dfs returns nothing #186

Open nturenne opened 6 years ago

nturenne commented 6 years ago

Hi !

i use rmr2 and hadoop as a newbie on windows 7 with hadoop 2.7.1 and R 3.3.2

when i run "to.dfs" it runs and "mapreduce" return no error but "from.dfs" return NULL as if nothing is computed on nodes

i just noticed that in hadoop storage directory

i join below the trace of log4j i found in an hadoop dfs directory ( path: D:\tmp\hadoop-turenne\dfs\data\current\BP-1122533003-192.168.0.28-1510766561951\current\finalized\subdir0\subdir0\blk_1073741894 )

thank you for help nico


This are my parameters :

init env R

Sys.setenv(HADOOP_CMD="D:/Soft/hadoop/bin/hadoop") Sys.setenv(HADOOP_HOME="D:/Soft/hadoop/") Sys.setenv("HADOOP_PREFIX"="D:/Soft/hadoop/") Sys.setenv(HADOOP_STREAMING="D:/Soft/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar")

Sys.setenv(TMP = 'D:\Soft\hadoop') library(rhdfs) library(rmr2) library(ravro) .jinit() hdfs.init()

ints = to.dfs(1:10) 17/11/15 18:23:32 WARN zlib.ZlibFactory: Failed to load/initialize native-zlib library 17/11/15 18:23:32 INFO compress.CodecPool: Got brand-new compressor [.deflate] calc = mapreduce(input = ints, map = function(k, v) cbind(v, 2*v)) packageJobJar: [/D:/Soft/hadoop/hadoop-unjar2499078311805680846/] [] D:\Soft\hadoop\streamjob9034814976255524682.jar tmpDir=null 17/11/15 18:23:37 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 17/11/15 18:23:37 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 17/11/15 18:23:38 INFO mapred.FileInputFormat: Total input paths to process : 1 17/11/15 18:23:38 INFO mapreduce.JobSubmitter: number of splits:2 17/11/15 18:23:38 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 17/11/15 18:23:39 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1510766571013_0001 17/11/15 18:23:39 INFO impl.YarnClientImpl: Submitted application application_1510766571013_0001 17/11/15 18:23:39 INFO mapreduce.Job: The url to track the job: http://portlisis03:8088/proxy/application_1510766571013_0001/ 17/11/15 18:23:39 INFO mapreduce.Job: Running job: job_1510766571013_0001 17/11/15 18:23:52 INFO mapreduce.Job: Job job_1510766571013_0001 running in uber mode : false 17/11/15 18:23:52 INFO mapreduce.Job: map 0% reduce 0% 17/11/15 18:24:02 INFO mapreduce.Job: map 100% reduce 0% 17/11/15 18:24:03 INFO mapreduce.Job: Job job_1510766571013_0001 completed successfully 17/11/15 18:24:04 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=244824 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=985 HDFS: Number of bytes written=244 HDFS: Number of read operations=14 HDFS: Number of large read operations=0 HDFS: Number of write operations=4 Job Counters Launched map tasks=2 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=16185 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=16185 Total vcore-seconds taken by all map tasks=16185 Total megabyte-seconds taken by all map tasks=16573440 Map-Reduce Framework Map input records=3 Map output records=0 Input split bytes=192 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=214 CPU time spent (ms)=2604 Physical memory (bytes) snapshot=352190464 Virtual memory (bytes) snapshot=570482688 Total committed heap usage (bytes)=266862592 File Input Format Counters Bytes Read=793 File Output Format Counters Bytes Written=244 17/11/15 18:24:04 INFO streaming.StreamJob: Output directory: /Corpus/file2a107f9f2e58 from.dfs(calc) $key NULL

$val NULL

sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1

locale: [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252
[3] LC_MONETARY=French_France.1252 LC_NUMERIC=C
[5] LC_TIME=French_France.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] ravro_1.0.4 bit64_0.9-5 bit_1.1-12 rmr2_3.3.0 rhdfs_1.0.8 rJava_0.9-8

loaded via a namespace (and not attached): [1] Rcpp_0.12.8 digest_0.6.11 bitops_1.0-6 plyr_1.8.4 magrittr_1.5
[6] stringi_1.1.2 reshape2_1.4.2 functional_0.6 rjson_0.2.15 RJSONIO_1.3-0 [11] tools_3.3.2 stringr_1.1.0 caTools_1.17.1

TRACE of file in hadoop

log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

log.txt log.txt