I've been struggling to find relevant answers that might help solve the problem, but after hundreds of googling I felt I was out of luck. I'm wondering if someone could point out where might the problem is.
I'm running parallel with future + doparallel (doFuture) with simple data.table code.
I'm using one note with 122 cores on the slurm server, using
This launches R Server (opensource version), and I connect to it with ssh, with connection info generated on rserver.log file:
Create an SSH tunnel with:
ssh -N -L 8080:c0706a-s27.dsfcf:33209 username@servername.edu
Then, open in the local browser:
http://localhost:8080
Below is the setting in my R:
library(tidyverse)
library(data.table)
library(doFuture)
library(progressr)
library(fst)
library(fasttime)
handlers(global = TRUE)
handlers("progress")
options(future.globals.maxSize= 1e20)
options(future.gc=TRUE)
availableCores()
availableWorkers()
plan(cluster, workers = 120)
# plan(multisession, workers = 120) # Also tried with multisession as well
registerDoFuture()
And the process is basically reading lots of csv files and filtering it in parallel. Here's my code:
csv_parser = function(folder_address, root_symbol = NULL, out_path = NULL, test = FALSE, type = 1){
# unzip command for each file
filepath_list = str_c('unzip -p ', list.files(folder_address, full.names = T))
if (test==TRUE) {filepath_list = filepath_list[1:5]}
# read file as data.table and append into list
p <- progressor(along = filepath_list)
list_df <- foreach(x = seq_along(filepath_list)) %dopar% {
p(sprintf("x=%g", x))
DT = fread(cmd = filepath_list[[x]], fill=TRUE)
if (!is.null(root_symbol)) {
DT = DT[root %chin% root_symbol]
}
gc()
return(DT)
}
if (is.null(out_path)){
result = rbindlist(list_df, fill=TRUE)
# setnames(result, clean_names)
return(result)
} else {
full_DT = rbindlist(list_df, fill=TRUE)
# setnames(full_DT, clean_names)
write.fst(full_DT, out_path, compress=100)
}
}
cboe_parser(folder_address, root_symbol = snp500_tickers, out_path = 'some/path')
It seems it starts up multiple subprocesses but crashes in couple minutes.
The error says:
Error in unserialize(node$con) :
ClusterFuture (doFuture-2) failed to receive results from cluster RichSOCKnode HenrikBengtsson/doFuture#2 (PID 56721 on ‘localhost’).
The reason reported was ‘error reading from connection’. Post-mortem diagnostic:
The total size of the 9 globals exported is 145.86 KiB.
The three largest globals are ‘filepath_list’ (101.57 KiB of class ‘character’), ‘root_symbol’ (35.55 KiB of class ‘character’) and ‘p’ (5.38 KiB of class ‘function’)
It doesn't seem related to the size but I made sure with options(future.globals.maxSize= 1e20)
I tried with plan(multisession, workers=120) and plan(cluster, workers=120) but it yielded the same errors.
I'd greatly appreciate if you could point where the problem might be. I've been using future (doParallel) for a while in the same cluster setting, and it's been working great, but somehow it started to give this error message recently.
Hi,
I've been struggling to find relevant answers that might help solve the problem, but after hundreds of googling I felt I was out of luck. I'm wondering if someone could point out where might the problem is.
I'm running parallel with future + doparallel (doFuture) with simple data.table code.
I'm using one note with 122 cores on the slurm server, using
This launches R Server (opensource version), and I connect to it with ssh, with connection info generated on rserver.log file:
Below is the setting in my R:
And the process is basically reading lots of csv files and filtering it in parallel. Here's my code:
It seems it starts up multiple subprocesses but crashes in couple minutes. The error says:
It doesn't seem related to the size but I made sure with
options(future.globals.maxSize= 1e20)
I tried withplan(multisession, workers=120)
andplan(cluster, workers=120)
but it yielded the same errors.Here's sessionInfo:
and here's number of cores
I'd greatly appreciate if you could point where the problem might be. I've been using future (doParallel) for a while in the same cluster setting, and it's been working great, but somehow it started to give this error message recently.