Closed ercbk closed 4 years ago
PuTTY has a section "Sending of null packets to keep session active" with setting, "Seconds between keepalives," There's also "Low level TCP connection options" with setting, "Enable TCP keepalives." Wondering if this would help and if there's a way to take advantage of it through plink.
Maybe you could use the progressr package to signal progress updates from within your future_map()
.f
expression to keep the connection alive. Such progression updates will be basically relayed back to the main R session as they are produces, cf. https://cran.r-project.org/web/packages/progressr/vignettes/progressr-intro.html. It's a bit of a hack, but at least this will give you some clues on whether it is a SSH timeout or not.
I'm not ready to call victory yet, but I have a strong candidate for a solution. It turns out you can use those settings in PuTTY non-interactively with plink. I ran the basic nested code in the original post (Sys.sleep(160)
) with the config below, and it finished after 16 min with no problems.
cl <- future::makeClusterPSOCK(
## Public IP number of EC2 instance
public_ip,
## User name (always 'ubuntu')
user = "ubuntu",
## Use private SSH key registered with AWS
rshcmd = c("plink", "-ssh", "-load", "futureSettings", "-i", ssh_private_key_file),
rscript_args = c("-e", shQuote(".libPaths('/home/rstudio/R/x86_64-pc-linux-gnu-library/3.6')")
),
verbose = TRUE
)
The -load
flag loads a saved PuTTY session (e.g. futureSettings) which can include... probably everything listed in the rshcmd argument (and maybe the user argument too), but most importantly, the setting(s) that sends those null packets every so often. Here are the steps to do it.
Also, I did try progressr
with future_map, and it kind of worked but not as intended. The progress bar displayed and completed, but all at once and at the beginning portion of the execution.
Still have to try the PuTTY settings with the nested-cv script. I'll update this post once that happens.
Update: It works on my nested-cv script! My previous record for longest, successfully completed run was 4.05 minutes. I've now had successful runs of 12 minutes and 21.07 minutes. It's lookin' good. I'm going to go ahead and close the issue. Thank you for the package, Henrik.
I'm running a nested cross-validation script similar to this one on 2 AWS instances. For smaller nested structures (i.e. fewer folds, resamples, and repeats), everything works fine, but for larger structures that have run times of around 13 minutes, the) from cluster SOCKnode #1 (PID 18309 on ‘13.48.133.115’). The reason reported was ‘error reading from connection’" I'm estimating the runtimes by watching the processes through SSH'ing the instances, but it doesn't error for another 30 or so minutes later. This configuration I'm using:
future_map2
hangs and eventually ends some time later with an error like "Error in unserialize(node$con) :Failed to retrieve the value of ClusterFuture (I don't think it can be a RAM issue, because I've ran the script on a couple r5x8large instances and the most RAM that's every been used is around 8 GB. The SSH logs for failed and successful runs are in my dropbox. There's also logs in there for a run on 2 t3 instances that were on smaller nested structures and have shorter logs which might be easier to look through.
I've also gotten the same error on basic nested structures that run over 13 min. Think this ran for about 20 min on the t3 instances and errored around 30 mins after that.
I also get an error even if there's 5 min of inactivity. I run
makeClusterPSOCK
andplan
, wait five minutes, run the basic nested code above, and get the error, "Error in serialize(data, node$con) : error writing to connection". Not sure if that's related or not. I ran this test with and without firewall and antivirus and it was the same error result. I've also looked through the PUTTY/plink options/issues and nothing looked relevant to my problem.This feels like a timeout or some other networking issue, but when I ssh through putty it never timeouts or disconnects. Is this a non-interactive ssh connection issue? Networking is all magic to me. Would be appreciate any help you can offer.
Windows system info:
OS Name: Microsoft Windows 10 Pro
OS Version: 10.0.18362 N/A Build 18362
Network Card(s): 3 NIC(s) Installed.
[01]: 802.11n USB Wireless LAN Card
Connection Name: Wi-Fi
Status: Media disconnected
[02]: Intel(R) Ethernet Connection I217-LM
Connection Name: Ethernet
DHCP Enabled: No
aws nested-cv session info
```r - Session info ------------------------------------------------------------------------------------------ setting value version R version 3.6.2 (2019-12-12) os Windows 10 x64 system x86_64, mingw32 ui RStudio language (EN) collate English_United States.1252 ctype English_United States.1252 tz America/New_York date 2020-04-14 - Packages ---------------------------------------------------------------------------------------------- package * version date lib source askpass 1.1 2019-01-13 [1] CRAN (R 3.6.1) assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.1) backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.1) base64enc 0.1-3 2015-07-28 [1] CRAN (R 3.6.0) bayesplot 1.7.1 2019-12-01 [1] CRAN (R 3.6.2) boot 1.3-24 2019-12-20 [1] CRAN (R 3.6.2) broom * 0.5.5 2020-02-29 [1] CRAN (R 3.6.3) callr 3.4.3 2020-03-28 [1] CRAN (R 3.6.2) class 7.3-15 2019-01-01 [2] CRAN (R 3.6.2) cli 2.0.2 2020-02-28 [1] CRAN (R 3.6.3) clipr 0.7.0 2019-07-23 [1] CRAN (R 3.6.1) codetools 0.2-16 2018-12-24 [2] CRAN (R 3.6.2) colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.1) colourpicker 1.0 2017-09-27 [1] CRAN (R 3.6.1) crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.1) crosstalk 1.1.0.1 2020-03-13 [1] CRAN (R 3.6.3) curl 4.3 2019-12-02 [1] CRAN (R 3.6.2) data.table * 1.12.8 2019-12-09 [1] CRAN (R 3.6.2) desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.1) details * 0.2.1 2020-01-12 [1] CRAN (R 3.6.2) dials * 0.0.4 2019-12-02 [1] CRAN (R 3.6.2) DiceDesign 1.8-1 2019-07-31 [1] CRAN (R 3.6.1) digest 0.6.25 2020-02-23 [1] CRAN (R 3.6.2) dplyr * 0.8.5 2020-03-07 [1] CRAN (R 3.6.3) DT 0.13 2020-03-23 [1] CRAN (R 3.6.3) dtplyr * 1.0.1 2020-01-23 [1] CRAN (R 3.6.2) dygraphs 1.1.1.6 2018-07-11 [1] CRAN (R 3.6.1) ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.1) evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.1) fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.2) fastmap 1.0.1 2019-10-08 [1] CRAN (R 3.6.1) foreach 1.4.8 2020-02-09 [1] CRAN (R 3.6.2) forge 0.2.0 2019-02-26 [1] CRAN (R 3.6.1) furrr * 0.1.0 2018-05-16 [1] CRAN (R 3.6.1) future * 1.16.0 2020-01-16 [1] CRAN (R 3.6.2) generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.1) ggplot2 * 3.3.0.9000 2020-04-04 [1] Github (tidyverse/ggplot2@bca6105) ggridges 0.5.2 2020-01-12 [1] CRAN (R 3.6.2) globals 0.12.5 2019-12-07 [1] CRAN (R 3.6.1) glue * 1.4.0 2020-04-03 [1] CRAN (R 3.6.2) gower 0.2.1 2019-05-14 [1] CRAN (R 3.6.1) GPfit 1.0-8 2019-02-08 [1] CRAN (R 3.6.2) gridExtra 2.3 2017-09-09 [1] CRAN (R 3.6.1) gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.1) gtools 3.8.1 2018-06-26 [1] CRAN (R 3.6.0) htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.1) htmlwidgets 1.5.1 2019-10-08 [1] CRAN (R 3.6.1) httpuv 1.5.2 2019-09-11 [1] CRAN (R 3.6.1) httr 1.4.1 2019-08-05 [1] CRAN (R 3.6.1) igraph 1.2.5 2020-03-19 [1] CRAN (R 3.6.3) infer * 0.5.1 2019-11-19 [1] CRAN (R 3.6.2) ini 0.3.1 2018-05-20 [1] CRAN (R 3.6.1) inline 0.3.15 2018-05-18 [1] CRAN (R 3.6.1) ipred 0.9-9 2019-04-28 [1] CRAN (R 3.6.1) iterators 1.0.12 2019-07-26 [1] CRAN (R 3.6.1) janeaustenr 0.1.5 2017-06-10 [1] CRAN (R 3.6.1) jsonlite 1.6.1 2020-02-02 [1] CRAN (R 3.6.2) knitr 1.28 2020-02-06 [1] CRAN (R 3.6.2) later 1.0.0 2019-10-04 [1] CRAN (R 3.6.1) lattice 0.20-38 2018-11-04 [2] CRAN (R 3.6.2) lava 1.6.7 2020-03-05 [1] CRAN (R 3.6.3) lhs 1.0.1 2019-02-03 [1] CRAN (R 3.6.1) lifecycle 0.2.0 2020-03-06 [1] CRAN (R 3.6.3) listenv 0.8.0 2019-12-05 [1] CRAN (R 3.6.2) lme4 1.1-21 2019-03-05 [1] CRAN (R 3.6.1) loo 2.2.0 2019-12-19 [1] CRAN (R 3.6.2) lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.6.1) magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.1) markdown 1.1 2019-08-07 [1] CRAN (R 3.6.1) MASS 7.3-51.4 2019-03-31 [2] CRAN (R 3.6.2) Matrix 1.2-18 2019-11-27 [2] CRAN (R 3.6.2) matrixStats 0.56.0 2020-03-13 [1] CRAN (R 3.6.3) mime 0.9 2020-02-04 [1] CRAN (R 3.6.2) miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 3.6.1) minqa 1.2.4 2014-10-09 [1] CRAN (R 3.6.1) mlflow * 1.7.0 2020-03-03 [1] CRAN (R 3.6.3) munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.1) nlme 3.1-145 2020-03-04 [1] CRAN (R 3.6.3) nloptr 1.2.2.1 2020-03-11 [1] CRAN (R 3.6.3) nnet 7.3-12 2016-02-02 [2] CRAN (R 3.6.2) openssl 1.4.1 2019-07-18 [1] CRAN (R 3.6.1) pacman 0.5.1 2019-03-11 [1] CRAN (R 3.6.1) parsnip * 0.0.5 2020-01-07 [1] CRAN (R 3.6.2) pillar 1.4.3 2019-12-20 [1] CRAN (R 3.6.2) pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.1) pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.1) plyr 1.8.6 2020-03-03 [1] CRAN (R 3.6.3) png 0.1-7 2013-12-03 [1] CRAN (R 3.6.0) prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.2) pROC 1.16.2 2020-03-19 [1] CRAN (R 3.6.3) processx 3.4.2 2020-02-09 [1] CRAN (R 3.6.2) prodlim 2019.11.13 2019-11-17 [1] CRAN (R 3.6.2) promises 1.1.0 2019-10-04 [1] CRAN (R 3.6.1) ps 1.3.2 2020-02-13 [1] CRAN (R 3.6.3) purrr * 0.3.3 2019-10-18 [1] CRAN (R 3.6.2) R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.2) ranger * 0.12.1 2020-01-10 [1] CRAN (R 3.6.2) Rcpp 1.0.4 2020-03-17 [1] CRAN (R 3.6.3) recipes * 0.1.10 2020-03-18 [1] CRAN (R 3.6.3) reshape2 1.4.3 2017-12-11 [1] CRAN (R 3.6.1) reticulate 1.14 2019-12-17 [1] CRAN (R 3.6.2) rlang 0.4.5 2020-03-01 [1] CRAN (R 3.6.3) rmarkdown 2.1 2020-01-20 [1] CRAN (R 3.6.2) rpart 4.1-15 2019-04-12 [2] CRAN (R 3.6.2) rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.3) RPushbullet * 0.3.3 2020-01-19 [1] CRAN (R 3.6.2) rsample * 0.0.5 2019-07-12 [1] CRAN (R 3.6.1) rsconnect 0.8.16 2019-12-13 [1] CRAN (R 3.6.2) rstan 2.19.3 2020-02-11 [1] CRAN (R 3.6.3) rstanarm 2.19.3 2020-02-11 [1] CRAN (R 3.6.3) rstantools 2.0.0 2019-09-15 [1] CRAN (R 3.6.1) rstudioapi 0.11 2020-02-07 [1] CRAN (R 3.6.3) scales * 1.1.0 2019-11-18 [1] CRAN (R 3.6.2) sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.1) shiny 1.4.0.2 2020-03-13 [1] CRAN (R 3.6.3) shinyjs 1.1 2020-01-13 [1] CRAN (R 3.6.2) shinystan 2.5.0 2018-05-01 [1] CRAN (R 3.6.1) shinythemes 1.1.2 2018-11-06 [1] CRAN (R 3.6.1) SnowballC 0.6.0 2019-01-15 [1] CRAN (R 3.6.0) StanHeaders 2.21.0-1 2020-01-19 [1] CRAN (R 3.6.2) stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.2) stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.1) survival 3.1-11 2020-03-07 [1] CRAN (R 3.6.3) swagger 3.9.2 2018-03-23 [1] CRAN (R 3.6.0) threejs 0.3.3 2020-01-21 [1] CRAN (R 3.6.2) tibble * 3.0.0 2020-03-30 [1] CRAN (R 3.6.2) tictoc * 1.0 2014-06-17 [1] CRAN (R 3.6.0) tidymodels * 0.1.0 2020-02-16 [1] CRAN (R 3.6.3) tidyposterior 0.0.2 2018-11-15 [1] CRAN (R 3.6.1) tidypredict 0.4.5 2020-02-10 [1] CRAN (R 3.6.3) tidyr * 1.0.2 2020-01-24 [1] CRAN (R 3.6.2) tidyselect 1.0.0 2020-01-27 [1] CRAN (R 3.6.2) tidytext 0.2.3 2020-03-04 [1] CRAN (R 3.6.3) timeDate 3043.102 2018-02-21 [1] CRAN (R 3.6.0) tokenizers 0.2.1 2018-03-29 [1] CRAN (R 3.6.1) tune * 0.0.1 2020-01-02 [1] Github (tidymodels/tune@e044702) vctrs 0.2.4 2020-03-10 [1] CRAN (R 3.6.3) withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.1) workflows * 0.1.1 2020-03-17 [1] CRAN (R 3.6.3) xfun 0.12 2020-01-13 [1] CRAN (R 3.6.2) xml2 1.2.5 2020-03-11 [1] CRAN (R 3.6.3) xtable 1.8-4 2019-04-21 [1] CRAN (R 3.6.1) xts 0.12-0 2020-01-19 [1] CRAN (R 3.6.2) yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.2) yardstick * 0.0.6 2020-03-17 [1] CRAN (R 3.6.3) zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.1) zoo 1.8-7 2020-01-10 [1] CRAN (R 3.6.2) [1] C:/Users/tbats/Documents/R/win-library/3.6 [2] C:/Program Files/R/R-3.6.2/library ```