Bioconductor / BiocParallel

Bioconductor facilities for parallel evaluation
https://bioconductor.org/packages/BiocParallel
67 stars 29 forks source link

allow the user to pass an infinite task number #188

Closed Jiefei-Wang closed 2 years ago

Jiefei-Wang commented 2 years ago

In some cases, the user might want the worker to run one element of the job each time, so he can set the task number to Inf to enable it. bptasks does not fully support the infinite number, I simply set it to the maximum of the integer. For example

p <-SerialParam()
bptasks(p) <- Inf
class: SerialParam
  bpisup: FALSE; bpnworkers: 1; bptasks: 2147483647; bpjobname: BPJOB
  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
  bpRNGseed: ; bptimeout: NA; bpprogressbar: FALSE
  bpexportglobals: FALSE; bpforceGC: FALSE
  bplogdir: NA
  bpresultdir: NA
mtmorgan commented 2 years ago

Hmm, I'm not a fan of this.

> p = SnowParam(tasks = Inf)
Warning message:
In .prototype_update(.SnowParam_prototype, .clusterargs = clusterargs,  :
  NAs introduced by coercion to integer range
> bptasks(p)
[1] NA
> bptasks(p) = Inf
> bptasks(p)
[1] 2147483647

The patch doesn't support Inf during object construction. I don't like using Inf (real number) as a proxy for .Machine$integer.max (integer). The logic in the patch is pretty convoluted, with Inf > .Macihne$integer.max not triggering an error but other values > .Machine$integer.max complaining about values exceeding the 'maximum limit'. The user assigns Inf, but gets some cryptic number as a result.

Maybe NA_integer_ (NA to the user) would be a suitable alternative? At the same time it seems like one could validate tasks= or bptasks<- to ensure that the value is non-negative (i.e., fail with SnowParam(tasks = -1). Along the lines of the (untested) following...

$ git diff master
diff --git a/R/BiocParallelParam-class.R b/R/BiocParallelParam-class.R
index 1c98929..028115f 100644
--- a/R/BiocParallelParam-class.R
+++ b/R/BiocParallelParam-class.R
@@ -85,6 +85,8 @@ setValidity("BiocParallelParam", function(object)
         msg <- c(msg, "bptasks(BPPARAM) must be an integer")
     if (length(tasks) > 1L)
         msg <- c(msg, "length(bptasks(BPPARAM)) must be == 1")
+    if (!is.na(tasks) && tasks < 0L)
+        msg <- c(msg, "bptasks(BPPARAM) must be >= 0 or 'NA'")

     if (is.character(workers)) {
         if (length(workers) < 1L)
@@ -157,7 +159,8 @@ setReplaceMethod("bptasks", c("BiocParallelParam", "numeric"),
     function(x, value)
 {
     x$tasks <- as.integer(value)
-    x
+    validObject(x)
+    x
 })

 setMethod("bpjobname", "BiocParallelParam",
diff --git a/R/utilities.R b/R/utilities.R
index cc7f416..acca0e9 100644
--- a/R/utilities.R
+++ b/R/utilities.R
@@ -68,7 +68,9 @@
 .ntask <-
     function(X, workers, tasks)
 {
-    if (tasks == 0L) {
+    if (is.na(tasks)) {
+        length(X)
+    } else if (tasks == 0L) {
         workers
     } else {
         min(length(X), tasks)

This changes the behavior for all BiocParallelParam, not just SnowParam...

Jiefei-Wang commented 2 years ago

Thanks, bptask uses NA to represent the infinite task number now.