Closed kchaitanyabandi closed 6 years ago
Hey @kchaitanyabandi
Caret github package installation is somewhat different because they have their R project located in a subdirectory of the github repo. In this case, it's located in "~/pkg/caret" in https://github.com/topepo/caret.
The cluster configuration file for github installation needs a path for installing packages on every node. The most common github package installation is repo_name/project_name (For example, Azure/doAzureParallel). You only need github authentication token if you are using a private repo on github. Otherwise you can leave that blank.
I've added a sample cluster config file.
Here's a link to our documentation for github installation. https://github.com/Azure/doAzureParallel/tree/master/samples/package_management
I can also test it out if you have a sample data set Let me know if you have any more questions!
Thanks Brian
{
"name": "caret-pool",
"vmSize": "Standard_F2",
"maxTasksPerNode": 2,
"poolSize": {
"dedicatedNodes": {
"min": 0,
"max": 0
},
"lowPriorityNodes": {
"min": 2,
"max": 2
},
"autoscaleFormula": "QUEUE"
},
"containerImage": "rocker/tidyverse:latest",
"rPackages": {
"cran": [],
"github": ["topepo/caret/pkg/caret", "Azure/doAzureParallel"],
"bioconductor": []
},
"commandLine": []
}
Hey @brnleehng
I used the sample cluster config file you commented and the latest development version of caret got installed. But, still the problem persists and the important thing is that it isn't working only for multi-class classification data. For Binary Classification and Regression, it is working perfectly fine.
I am not authorized to share my data because of confidentiality reasons, but you could try it with any data containing multiple class labels (>2). The code that I am trying to run is as follows.
ctrl_gbm <- trainControl(method = "repeatedcv",
number = 10,
repeats = 5,
summaryFunction = multiClassSummary,
classProbs = TRUE,
verboseIter = TRUE)
gbmGrid <- expand.grid(nrounds = 100,
max_depth = 5,
eta = .05,
gamma = 0,
colsample_bytree = c(.6, .7),
min_child_weight = 1,
subsample = .8)
registerDoAzureParallel(mycluster)
tuned_fit_xgb <- train(x = xtrain,
y = ytrain,
method = "gbm",
verbose = TRUE,
metric = "logLoss",
trControl = ctrl_gbm,
tuneGrid = gbmGrid)
Hi @kchaitanyabandi
I was able to reproduce the error on the caret sample because of a missing R algorithm package (randomForest R package). To avoid missing any algorithm R packages, you can use a caret dockerfile that has a lot of the algorithm packages already installed.
https://hub.docker.com/r/jrowen/dcaret/~/dockerfile/
This includes ranger and glm R packages already installed.
Error: names(resamples) <- gsub("^\.", "", names(resamples)) : attempt to set an attribute on NULL
Hey Brian,
You're awesome. It solved the issue. Thank you so much for the quick debug and reply. I just wanted to know how you checked for the error that said a package was missing.
Thanks Krishna
@kchaitanyabandi
Thanks for the response! We added the cluster config file to our sample #237.
Using the job id that's printed on the console, you can navigate through the Azure Portal or BatchLabs (Our tool for monitoring Batch jobs, maybe the easiest way to navigate).
By going to the job tab of the Azure Portal or BatchLabs, From there, there will be a list of tasks. By clicking one of the tasks, you will see a folder with stdout.txt, stderr.txt and the [The id of the task].txt
If you click on the [Id of the task].txt, you will get the R console output.
Hey @brnleehng
The fix you provided had worked for moment and then, again it gave me an error with the following grid I was using for Multi Class Classification.
gbmGrid <- expand.grid(interaction.depth = 10:20,
n.trees = c(100, 150, 200, 250, 300, 350, 400, 450, 500, 1000, 1175, 1250, 1300),
shrinkage = c(0.025, .05, .1, 0.2, 0.3),
n.minobsinnode = c(5:10, 20, 30))
ctrl_gbm <- trainControl(method = "repeatedcv",
number = 10,
repeats = 5,
summaryFunction = multiClassSummary,
classProbs = TRUE,
verboseIter = TRUE)
tuned_fit_gbm <- train(x = train_data[, names(train_data) != dep_var],
y = train_data[, names(train_data) == dep_var],
method = "gbm",
verbose = TRUE,
metric = "logLoss",
trControl = ctrl_gbm,
tuneGrid = gbmGrid)
The grid has a total of 22000 tasks it submitted to the Batch Pool and the same error
Error in names(resamples) <- gsub("^\.", "", names(resamples)) : attempt to set an attribute on NULL
popped up after 239 successfully completed tasks. I wonder why, but I need to check the logs using the method you suggested using Batch Labs application.
I'm in the process of getting the login credentials for the Azure Batch Account. In the meanwhile, is there any other method using which I can see those files from R console ?
Hi @kchaitanyabandi
To get files from the job, we have a getJobFile api. We have some documentation on getting files from the node. Link
In order to get the logs, you'll need to get the job id and the task that failed. You'll want to get the [Task Id].txt (For example, 1.txt) because that'll contain the logs from your R program.
# Get the logs from task 1 that was run by R
taskLogs <- getJobFile("job20180322051216", "1", "wd/1.txt")
cat(taskLogs)
# Get the stdout output from task 2
stdoutLogs <- getJobFile("job20180322051216", "2", "stdout.txt")
cat(stdout)
Here's a structure of the job directory on the node
Thanks, Brian
Hey Brian,
I checked the error logs of the failed tasks, and I couldn't quite understand what might have gone wrong. The following is the output that says Error Code: 0 at the bottom. Could you please share any insight that might indicate what might have gone wrong ?
[1] "argsList" "bioconductor" "cloudCombine"
[4] "enableCloudCombine" "exportenv" "expr"
[7] "github" "packages" "pkgName"
[1] "caret"
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)
Matrix products: default
BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.19.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets base
other attached packages:
[1] caret_6.0-79 ggplot2_2.2.1 lattice_0.20-35
loaded via a namespace (and not attached):
[1] tidyselect_0.2.4 purrr_0.2.4 reshape2_1.4.3 kernlab_0.9-25
[5] splines_3.4.2 colorspace_1.3-2 stats4_3.4.2 survival_2.41-3
[9] prodlim_1.6.1 rlang_0.2.0 ModelMetrics_1.1.0 pillar_1.2.1
[13] foreign_0.8-69 glue_1.2.0 withr_2.1.2 bindrcpp_0.2
[17] foreach_1.4.4 bindr_0.1 plyr_1.8.4 dimRed_0.1.0
[21] lava_1.5.1 robustbase_0.92-8 stringr_1.3.0 timeDate_3043.102
[25] munsell_0.4.3 gtable_0.2.0 recipes_0.1.2 codetools_0.2-15
[29] psych_1.7.8 parallel_3.4.2 class_7.3-14 DEoptimR_1.0-8
[33] broom_0.4.3 methods_3.4.2 Rcpp_0.12.16 scales_0.5.0
[37] ipred_0.9-6 CVST_0.2-1 mnormt_1.5-5 stringi_1.1.7
[41] dplyr_0.7.4 RcppRoll_0.2.2 ddalpha_1.3.1.1 grid_3.4.2
[45] tools_3.4.2 magrittr_1.5 lazyeval_0.2.0 tibble_1.4.2
[49] tidyr_0.8.0 DRR_0.0.2 pkgconfig_2.0.1 MASS_7.3-47
[53] Matrix_1.2-11 lubridate_1.7.3 gower_0.1.2 assertthat_0.2.0
[57] iterators_1.0.8 R6_2.2.2 rpart_4.1-11 sfsmisc_1.1-2
[61] nnet_7.3-12 nlme_3.1-131 compiler_3.4.2
+ Fold01.Rep1: shrinkage=0.100, interaction.depth=18, n.minobsinnode= 6, n.trees=1300
Iter TrainDeviance ValidDeviance StepSize Improve
1 1.0986 -nan 0.1000 0.1951
2 0.9683 -nan 0.1000 0.1396
3 0.8747 -nan 0.1000 0.1040
4 0.8033 -nan 0.1000 0.0821
5 0.7463 -nan 0.1000 0.0658
6 0.7020 -nan 0.1000 0.0546
7 0.6636 -nan 0.1000 0.0442
8 0.6314 -nan 0.1000 0.0367
9 0.6049 -nan 0.1000 0.0349
10 0.5803 -nan 0.1000 0.0279
20 0.4431 -nan 0.1000 0.0082
40 0.3491 -nan 0.1000 0.0029
60 0.3006 -nan 0.1000 -0.0001
80 0.2667 -nan 0.1000 0.0005
100 0.2443 -nan 0.1000 -0.0007
120 0.2280 -nan 0.1000 -0.0010
140 0.2145 -nan 0.1000 -0.0007
160 0.2043 -nan 0.1000 -0.0007
180 0.1957 -nan 0.1000 -0.0015
200 0.1885 -nan 0.1000 -0.0011
220 0.1823 -nan 0.1000 -0.0021
240 0.1768 -nan 0.1000 -0.0017
260 0.1727 -nan 0.1000 -0.0019
280 0.1689 -nan 0.1000 -0.0011
300 0.1657 -nan 0.1000 -0.0020
320 0.1627 -nan 0.1000 -0.0019
340 0.1604 -nan 0.1000 -0.0017
360 0.1581 -nan 0.1000 -0.0018
380 0.1559 -nan 0.1000 -0.0015
400 0.1541 -nan 0.1000 -0.0015
420 0.1523 -nan 0.1000 -0.0013
440 0.1508 -nan 0.1000 -0.0018
460 0.1491 -nan 0.1000 -0.0022
480 0.1477 -nan 0.1000 -0.0019
500 0.1463 -nan 0.1000 -0.0015
520 0.1452 -nan 0.1000 -0.0020
540 0.1440 -nan 0.1000 -0.0015
560 0.1432 -nan 0.1000 -0.0011
580 0.1418 -nan 0.1000 -0.0019
600 0.1410 -nan 0.1000 -0.0013
620 0.1399 -nan 0.1000 -0.0018
640 0.1390 -nan 0.1000 -0.0016
660 0.1382 -nan 0.1000 -0.0015
680 0.1372 -nan 0.1000 -0.0014
700 0.1366 -nan 0.1000 -0.0012
720 0.1359 -nan 0.1000 -0.0015
740 0.1352 -nan 0.1000 -0.0017
760 0.1345 -nan 0.1000 -0.0016
780 0.1339 -nan 0.1000 -0.0016
800 0.1333 -nan 0.1000 -0.0017
820 0.1327 -nan 0.1000 -0.0015
840 0.1321 -nan 0.1000 -0.0023
860 0.1316 -nan 0.1000 -0.0017
880 0.1313 -nan 0.1000 -0.0013
900 0.1306 -nan 0.1000 -0.0023
920 0.1302 -nan 0.1000 -0.0018
940 0.1298 -nan 0.1000 -0.0017
960 0.1292 -nan 0.1000 -0.0019
980 0.1289 -nan 0.1000 -0.0019
1000 0.1284 -nan 0.1000 -0.0016
1020 0.1282 -nan 0.1000 -0.0017
1040 0.1279 -nan 0.1000 -0.0024
1060 0.1275 -nan 0.1000 -0.0019
1080 0.1273 -nan 0.1000 -0.0022
1100 0.1270 -nan 0.1000 -0.0014
1120 0.1266 -nan 0.1000 -0.0016
1140 0.1264 -nan 0.1000 -0.0019
1160 0.1261 -nan 0.1000 -0.0014
1180 0.1257 -nan 0.1000 -0.0015
1200 0.1254 -nan 0.1000 -0.0019
1220 0.1251 -nan 0.1000 -0.0025
1240 0.1249 -nan 0.1000 -0.0019
1260 0.1244 -nan 0.1000 -0.0012
1280 0.1241 -nan 0.1000 -0.0018
1300 0.1239 -nan 0.1000 -0.0018
- Fold01.Rep1: shrinkage=0.100, interaction.depth=18, n.minobsinnode= 6, n.trees=1300
Error Code: 0
Looks like there's no errors occurred based on the logs..
Does BatchLabs say the tasks have errors? Can you look at the stderr.txt and stdout.txt files? Are the ValidDeviance '-nan' a valid answer?
If you have a sample dataset (I'm having a tough job, finding a dataset) and a working sample, that I can use to reproduce the problem. That'll be helpful.
Brian
Hey Brian,
Please send me your email id to me on bandi014@umn.edu and I'll send you the sample dataset.
Thanks Krishna
Working through via offline
Hi,
I am trying to train gbm and ranger using doAzureParallel backend with the train function of caret. But it gives me this error :
Error in names(resamples) <- gsub("^\.", "", names(resamples)) : attempt to set an attribute on NULL
This issue has been posted even on https://github.com/topepo/caret/issues/62 But I couldn't see any solution to that. I'm not sure if the problem exists with caret or doAzureParallel.
But then, I tried to install the development version of caret to see if the problem still persists. But I'm confused on how to install the github versions of R packages on the parallel processing nodes.
Could anyone please point to any documentation that talks about specifying the package names in "cluster.json" to install on the nodes from github? I entered the githubauthentication token the credentials.json file and mentioned the path of the package repository on github to install in against github : [ ] in "cluster.json", but I'm not sure if the packages are being installed from github.
I searched a lot on the web for the documentation but couldn't find it. So, had to break the rule of the issue template. I'm sorry. But help would be very very appreciated.
Example Code I'm running:
registerDoAzureParallel(azure_cluster_krishna)
My Session Info:
R version 3.4.3 (2017-11-30) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows Server >= 2012 x64 (build 9200)
Matrix products: default
locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages: [1] splines parallel stats graphics grDevices utils datasets methods base
other attached packages: [1] doAzureParallel_0.6.2 gbm_2.1.3 survival_2.41-3 gower_0.1.2
[5] dimRed_0.1.0 DRR_0.0.2 CVST_0.2-1 Matrix_1.2-12
[9] kernlab_0.9-25 DEoptimR_1.0-8 ddalpha_1.3.1 sfsmisc_1.1-1
[13] robustbase_0.92-8 class_7.3-14 pkgconfig_2.0.1 glue_1.2.0
[17] bindrcpp_0.2 assertthat_0.2.0 RcppRoll_0.2.2 ModelMetrics_1.1.0
[21] lazyeval_0.2.1 munsell_0.4.3 mime_0.5 stringdist_0.9.4.6
[25] wavethresh_4.6.8 Metrics_0.1.3 Cubist_0.2.1 plyr_1.8.4
[29] ClusterR_1.0.9 gtools_3.5.0 cluster_2.0.6 lsr_0.5
[33] MASS_7.3-47 doSNOW_1.0.15 snow_0.4-2 ranger_0.8.0
[37] randomForest_4.6-12 mice_2.46.0 stringr_1.2.0 fscaret_0.9.4.1
[41] hmeasure_1.0 gsubfn_0.6-6 proto_1.0.0 caret_6.0-79
[45] ggplot2_2.2.1 lattice_0.20-35 doParallel_1.0.11 iterators_1.0.9
[49] foreach_1.4.4 dplyr_0.7.4 data.table_1.10.4-3
loaded via a namespace (and not attached): [1] colorspace_1.3-2 rjson_0.2.15 prodlim_1.6.1 lubridate_1.7.1 codetools_0.2-15
[6] mnormt_1.5-5 ade4_1.7-8 jsonlite_1.5 broom_0.4.3 png_0.1-7
[11] FD_1.0-12 shiny_1.0.5 compiler_3.4.3 httr_1.3.1 htmltools_0.3.6
[16] tools_3.4.3 gmp_0.5-13.1 gtable_0.2.0 reshape2_1.4.3 Rcpp_0.12.14
[21] gdata_2.18.0 ape_5.0 nlme_3.1-131 psych_1.7.8 timeDate_3042.101 [26] devtools_1.13.4 MLmetrics_1.1.1 scales_0.5.0 ipred_0.9-6 rAzureBatch_0.5.6 [31] curl_3.0 yaml_2.1.15 memoise_1.1.0 rpart_4.1-11 stringi_1.1.6
[36] e1071_1.6-8 permute_0.9-4 tiff_0.1-5 caTools_1.17.1 lava_1.5.1
[41] geometry_0.3-6 bitops_1.0-6 rlang_0.1.4 ROCR_1.0-7 purrr_0.2.4
[46] bindr_0.1 OpenImageR_1.0.7 recipes_0.1.2 tidyselect_0.2.3 magrittr_1.5
[51] R6_2.2.2 gplots_3.0.1 foreign_0.8-69 withr_2.1.1 mgcv_1.8-22
[56] RCurl_1.95-4.10 nnet_7.3-12 tibble_1.3.4 KernSmooth_2.23-15 xgboost_0.6.4.1
[61] jpeg_0.1-8 grid_3.4.3 vegan_2.4-5 digest_0.6.15 xtable_1.8-2
[66] tidyr_0.7.2 httpuv_1.3.5 stats4_3.4.3 magic_1.5-6 tcltk_3.4.3