Closed andrewmilkowski closed 11 years ago
It's not clear to me where the read.table
call is coming in, as fitRandomForest.R
only consumes data from Hadoop. Perhaps some map tasks are somehow calling keyval
with no data?
I would also cross-post on the rmr repo as well, as it appears the error is generated in an rmr function.
will do, believe you are right in this particular test case scenerio
A rmr.str(v) at the beginning of the map function would clarify the issue. It seems Uri interpretation is correct but it begs the question of why that happens.
Antonio,
let me transfer this comment and further discussion to rmr2 ticket area (https://github.com/RevolutionAnalytics/rmr2/issues/69) , as to isolate the issue to correct component, for now...
I have added proposed debug statement in the beginning of the mapper function,
MAP function
poisson.subsample <- function(k, input) { rmr.str(input)
following is the output in the stderr logs
v), values(kv))
$ : language rmr.str(input)
input
'data.frame': 10 obs. of 74 variables:
$ SalePrice : num 26500 9500 19000 11500 65000 24000 38500 13500 21500 36000
$ ModelID.x : Factor w/ 9 levels "21442","2232",..: 8 1 7 3 6 8 4 9 5 2
$ datasource : Factor w/ 1 level "121": 1 1 1 1 1 1 1 1 1 1
$ auctioneerID : Factor w/ 1 level "3": 1 1 1 1 1 1 1 1 1 1
$ YearMade : num 2004 2003 1999 1991 1000 ...
$ MachineHoursCurrentMeter: num 508 0 2450 8005 20700 ...
$ UsageBand : Factor w/ 3 levels "High","Low","Medium": 2 NA 3 3 3 3 1 2 2 NA
$ saledate : Factor w/ 10 levels "2005-10-20","2005-11-17",..: 7 9 3 2 5 6 10 4 8 1
$ fiModelDesc.x : Factor w/ 9 levels "310E","310G",..: 2 5 1 6 7 2 8 3 4 9
$ fiBaseModel.x : Factor w/ 8 levels "310","334","430",..: 1 4 1 5 6 1 7 2 3 8
$ fiSecondaryDesc.x : Factor w/ 6 levels "B","E","G","HAG",..: 3 5 2 6 1 3 NA NA 4 NA
$ fiModelSeries.x : Factor w/ 2 levels "-6E","LC": NA NA NA NA NA NA 1 NA NA 2
$ fiModelDescriptor.x : int NA NA NA NA NA NA NA NA NA 6
$ ProductSize : Factor w/ 4 levels "Large","Large / Medium",..: NA 3 NA NA 1 NA 4 3 3 2
$ fiProductClassDesc.x : Factor w/ 6 levels "Backhoe Loader - 14.0 to 15.0 Ft Standard Digging Depth",..: 1 5 1 1 6 1 2 4 4 3
$ state : Factor w/ 8 levels "Arizona","Arkansas",..: 1 8 2 4 3 6 7 3 7 5
$ ProductGroup.x : Factor w/ 3 levels "BL","TEX","WL": 1 2 1 1 3 1 2 2 2 2
$ ProductGroupDesc.x : Factor w/ 3 levels "Backhoe Loaders",..: 1 2 1 1 3 1 2 2 2 2
$ Drive_System : Factor w/ 2 levels "Four Wheel Drive",..: 1 NA 2 2 NA 1 NA NA NA NA
$ Enclosure : Factor w/ 3 levels "EROPS","EROPS w AC",..: 3 1 3 1 2 3 2 1 1 1
$ Forks : logi NA NA NA NA NA NA ...
$ Pad_Type : Factor w/ 1 level "Street": NA NA NA NA NA 1 NA NA NA NA
$ Ride_Control : Factor w/ 1 level "No": 1 NA 1 1 NA 1 NA NA NA NA
$ Stick : Factor w/ 2 levels "Extended","Standard": 1 NA 2 2 NA 2 NA NA NA NA
$ Transmission : Factor w/ 2 levels "Powershuttle",..: 1 NA 2 2 NA 2 NA NA NA NA
$ Turbocharged : logi NA NA NA NA NA NA ...
$ Blade_Extension : logi NA NA NA NA NA NA ...
$ Blade_Width : logi NA NA NA NA NA NA ...
$ Enclosure_Type : logi NA NA NA NA NA NA ...
$ Engine_Horsepower : logi NA NA NA NA NA NA ...
$ Hydraulics : Factor w/ 2 levels "2 Valve","Auxiliary": NA 2 NA NA 1 NA 1 2 2 2
$ Pushblock : logi NA NA NA NA NA NA ...
$ Ripper : logi NA NA NA NA NA NA ...
$ Scarifier : logi NA NA NA NA NA NA ...
$ Tip_Control : logi NA NA NA NA NA NA ...
$ Tire_Size : logi NA NA NA NA NA NA ...
$ Coupler : Factor w/ 1 level "Manual": NA NA NA NA NA NA NA NA 1 NA
$ Coupler_System : logi NA NA NA NA NA NA ...
$ Grouser_Tracks : logi NA NA NA NA NA NA ...
$ Hydraulics_Flow : logi NA NA NA NA NA NA ...
$ Track_Type : Factor w/ 2 levels "Rubber","Steel": NA 2 NA NA NA NA NA 1 1 2
$ Undercarriage_Pad_Width : int NA 16 NA NA NA NA NA NA NA NA
$ Stick_Length : num NA NA NA NA NA NA NA NA NA 132
$ Thumb : logi NA NA NA NA NA NA ...
$ Pattern_Changer : logi NA NA NA NA NA NA ...
$ Grouser_Type : Factor w/ 1 level "Double": NA 1 NA NA NA NA NA 1 1 1
$ Backhoe_Mounting : logi NA NA NA NA NA NA ...
$ Blade_Type : logi NA NA NA NA NA NA ...
$ Travel_Controls : logi NA NA NA NA NA NA ...
$ Differential_Type : Factor w/ 1 level "Standard": NA NA NA NA 1 NA NA NA NA NA
$ Steering_Controls : Factor w/ 1 level "Conventional": NA NA NA NA 1 NA NA NA NA NA
$ saledatenumeric : num 14231 14637 13468 13104 13734 ...
$ ageAtSale : num 1539 2311 2603 5161 367746 ...
$ saleYear : num 2008 2010 2006 2005 2007 ...
$ saleMonth : Factor w/ 7 levels "August","December",..: 2 3 6 6 1 1 5 4 1 7
$ saleDay : Factor w/ 10 levels "09","14","16",..: 5 10 3 4 1 8 6 2 9 7
$ saleWeekday : Factor w/ 1 level "Thursday": 1 1 1 1 1 1 1 1 1 1
$ MedianModelPrice : int 25250 9500 19000 11500 65000 25250 38500 13500 21500 36000
$ ModelCount : num 2 1 1 1 1 2 1 1 1 1
$ ModelID.y : Factor w/ 9 levels "16705","21442",..: 8 2 7 4 6 8 1 9 5 3
$ fiModelDesc.y : Factor w/ 9 levels "310E","310G",..: 2 5 1 6 7 2 9 3 4 8
$ fiBaseModel.y : Factor w/ 8 levels "310","334","430",..: 1 4 1 5 6 1 8 2 3 7
$ fiSecondaryDesc.y : Factor w/ 6 levels "B","E","G","LC",..: 3 5 2 6 1 3 NA NA NA 4
$ fiModelSeries.y : int NA NA NA NA NA NA -6 NA NA 6
$ fiModelDescriptor.y : Factor w/ 1 level "LK": NA NA NA NA NA NA NA NA NA 1
$ fiProductClassDesc.y : Factor w/ 6 levels "Backhoe Loader - 14.0 to 15.0 Ft Standard Digging Depth",..: 1 3 1 1 6 1 5 2 2 4
$ ProductGroup.y : Factor w/ 3 levels "BL","TEX","WL": 1 2 1 1 3 1 3 2 2 2
$ ProductGroupDesc.y : Factor w/ 3 levels "Backhoe Loaders",..: 1 2 1 1 3 1 3 2 2 2
$ MfgYear : num 2004 2003 1999 1991 1987 ...
$ fiManufacturerID : Factor w/ 6 levels "103","121","25",..: 6 4 6 3 5 6 1 2 2 1
$ fiManufacturerDesc : Factor w/ 6 levels "Bobcat","Case",..: 5 4 5 2 3 5 6 1 1 6
$ PrimarySizeBasis : Factor w/ 3 levels "Horsepower","Standard Digging Depth - Ft",..: 2 3 2 2 1 2 1 3 3 3
$ PrimaryLower : int 14 4 14 14 350 14 225 3 3 40
$ PrimaryUpper : int 15 5 15 15 500 15 250 4 4 50
Dotted pair list of 12
$ : language (function() { load("./rmr-local-env9432cc02004") ...
$ : language rmr2:::map.loop(map = map, keyval.reader = input.reader(), keyval.writer = if (is.null(reduce)) { output.writer() ...
$ : language as.keyval(map(keys(kv), values(kv)))
$ : language is.keyval(x)
$ : language map(keys(kv), values(kv))
$ : language c.keyval(lapply(1:num.models, generate.sample))
$ : language f.single(args[[1]])
$ : language lapply(kvs, recycle.keyval)
$ : language FUN(X[[4L]], ...)
$ : language keyval(rmr.recycle(k, v), rmr.recycle(v, k))
$ : language rmr.recycle(k, v)
$ : language rmr.str(lx)
lx
int 1
Dotted pair list of 12
$ : language (function() { load("./rmr-local-env9432cc02004") ...
$ : language rmr2:::map.loop(map = map, keyval.reader = input.reader(), keyval.writer = if (is.null(reduce)) { output.writer() ...
$ : language as.keyval(map(keys(kv), values(kv)))
$ : language is.keyval(x)
$ : language map(keys(kv), values(kv))
$ : language c.keyval(lapply(1:num.models, generate.sample))
$ : language f.single(args[[1]])
$ : language lapply(kvs, recycle.keyval)
$ : language FUN(X[[4L]], ...)
$ : language keyval(rmr.recycle(k, v), rmr.recycle(v, k))
$ : language rmr.recycle(k, v)
$ : language rmr.str(ly)
ly
int 0
Error in rmr.recycle(k, v) : Can't recycle 0-length argument
Calls:
@laserson
sorry I confused you a bit, the lines
transactions <- read.table(file="../downloads/train.csv",
nrows=20,
are coming from joinData.R , it is how I reduced number of samples to fitRandomForest.R
internally in rmr2 as is seen above in the trace exception: Error in rmr.recycle(k, v) : Can't recycle 0-length argument
is where the problem is...
Issue moved to rmr repo.
Second issue is that if input data sample is reduced (example below will only use 20 rows from the overall training set)
transactions <- read.table(file="../downloads/train.csv",
nrows=1000,
running fitRandomForrest will terminate will the following exception:
Loading required package: randomForest randomForest 4.6-7 Type rfNews() to see new features/changes/bug fixes. Loading required package: rmr2 Loading required package: Rcpp Loading required package: RJSONIO Loading required package: methods Loading required package: bitops Loading required package: digest Loading required package: functional Loading required package: stringr Loading required package: plyr Loading required package: reshape2 Dotted pair list of 12 $ : language (function() { load("./rmr-local-envaaeb61a5a326") ... $ : language rmr2:::map.loop(map = map, keyval.reader = input.reader(), keyval.writer = if (is.null(reduce)) { output.writer() ... $ : language as.keyval(map(keys(kv), values(kv))) $ : language is.keyval(x) $ : language map(keys(kv), values(kv)) $ : language c.keyval(lapply(1:num.models, generate.sample)) $ : language f.single(args[[1]]) $ : language lapply(kvs, recycle.keyval) $ : language FUN(X[[1L]], ...) $ : language keyval(rmr.recycle(k, v), rmr.recycle(v, k)) $ : language rmr.recycle(k, v) $ : language rmr.str(lx) lx int 1 Dotted pair list of 12 $ : language (function() { load("./rmr-local-envaaeb61a5a326") ... $ : language rmr2:::map.loop(map = map, keyval.reader = input.reader(), keyval.writer = if (is.null(reduce)) { output.writer() ... $ : language as.keyval(map(keys(kv), values(kv))) $ : language is.keyval(x) $ : language map(keys(kv), values(kv)) $ : language c.keyval(lapply(1:num.models, generate.sample)) $ : language f.single(args[[1]]) $ : language lapply(kvs, recycle.keyval) $ : language FUN(X[[1L]], ...) $ : language keyval(rmr.recycle(k, v), rmr.recycle(v, k)) $ : language rmr.recycle(k, v) $ : language rmr.str(ly) ly int 0 Error in rmr.recycle(k, v) : Can't recycle 0-length argument Calls: ... c.keyval -> f.single -> lapply -> FUN -> keyval -> rmr.recycle
Execution halted
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:390)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
at org.apache.hadoop.mapred.Child.main(Child.java:260)