Bioconductor / BiocParallel

Bioconductor facilities for parallel evaluation
https://bioconductor.org/packages/BiocParallel
65 stars 29 forks source link

Increase depth of traceback beyond `tryCatch()` for bp* functions - possible enhancement #245

Closed MikeDMorgan closed 1 year ago

MikeDMorgan commented 1 year ago

Currently when using bp*() functions, the depth of traceback stored in the bplist_error object is limited to the internal tryCatch() call. The consequence is that lower-level error messages, e.g. using Rcpp::stop() are not fully propagated to the user or other functions designed to catch those lower-level errors.

Would it be possible to extend the error message reporting down to the original R/Rcpp-level error message? A toy example is shown below, with real output from a bplist_error object.

## Inside function (1)
func1 <- function(...){
    # execution code
    bp.list <- bpapply(..., FUN=func2, args)

    # bp.list is the output from bplappyl()
    bperr <- attr(bp.list[[x]], "traceback")

    # then add bperr to return object
    output.list[[x]] <- list(..., "ERROR"=bperr)
    return(output.list)
}

mainfunc <- function(...){
    result.list <- func1(...)

    if(check for other failure condition){
         print(lapply(result.list, `[[`, "ERROR"))
         stop(unique(unlist(lapply(result.list, `[[`, "ERROR"))))
     }
  return(blah blah blah)
}

mainfunc(args)

Example output:

[[140]]
 [1] "37: testNhoods(sim1.mylo, design = ~Condition + (1 | Condition), "                 "        design.df = sim1.meta, glmm.solver = \"Fisher\", fail.on.error = TRUE)"   
 [3] "36: glmmWrapper(Y = dge$counts, disper = 1/dispersion, Xmodel = x.model, "         "        Zmodel = z.model, off.sets = offsets, randlevels = rand.levels, "         
 [5] "        reml = REML, glmm.contr = glmm.cont, genonly = geno.only, "                "35: bptry({"                                                                      
 [7] "        bplapply(seq_len(nrow(Y)), BPOPTIONS = bpoptions(stop.on.error = FALSE), " "            FUN = function(i, Xmodel, Zmodel, Y, off.sets, randlevels, "          
 [9] "34: tryCatch(expr, ..., bplist_error = bplist_error, bperror = bperror)"           "33: tryCatchList(expr, classes, parentenv, handlers)"                             
[11] "32: tryCatchOne(tryCatchList(expr, names[-nh], parentenv, handlers[-nh]), "        "        names[nh], parentenv, handlers[[nh]])"                                    
[13] "31: doTryCatch(return(expr), name, parentenv, handler)"                            "30: tryCatchList(expr, names[-nh], parentenv, handlers[-nh])"                     
[15] "29: tryCatchOne(expr, names, parentenv, handlers[[1L]])"                           "28: doTryCatch(return(expr), name, parentenv, handler)"                           
[17] "27: bplapply(seq_len(nrow(Y)), BPOPTIONS = bpoptions(stop.on.error = FALSE), "     "        FUN = function(i, Xmodel, Zmodel, Y, off.sets, randlevels, "              
[19] "            disper, genonly, kin.ship, glmm.contr, reml) {"                        "26: bplapply(seq_len(nrow(Y)), BPOPTIONS = bpoptions(stop.on.error = FALSE), "    
[21] "        FUN = function(i, Xmodel, Zmodel, Y, off.sets, randlevels, "               "            disper, genonly, kin.ship, glmm.contr, reml) {"                       
[23] "25: .bpinit(manager = manager, X = X, FUN = FUN, ARGS = ARGS, BPPARAM = BPPARAM, " "        BPOPTIONS = BPOPTIONS, BPREDO = BPREDO)"                                  
[25] "24: bploop(manager, BPPARAM = BPPARAM, BPOPTIONS = BPOPTIONS, ...)"                "23: bploop.lapply(manager, BPPARAM = BPPARAM, BPOPTIONS = BPOPTIONS, "            
[27] "        ...)"                                                                      "22: .bploop_impl(ITER = ITER, FUN = FUN, ARGS = ARGS, BPPARAM = BPPARAM, "        
[29] "        BPOPTIONS = BPOPTIONS, BPREDO = BPREDO, reducer = reducer, "               "        progress.length = length(redo_index))"                                    
[31] "21: .collect_result(manager, reducer, progress, BPPARAM)"                          "20: .manager_recv(manager)"                                                       
[33] "19: .manager_recv(manager)"                                                        "18: .recv_any(manager$backend)"                                                   
[35] "17: .recv_any(manager$backend)"                                                    "16: .bpworker_EXEC(msg, bplog(backend$BPPARAM))"                                  
[37] "15: tryCatch({"                                                                    "        do.call(msg$data$fun, msg$data$args)"                                     
[39] "    }, error = function(e) {"                                                      "14: tryCatchList(expr, classes, parentenv, handlers)"                             
[41] "13: tryCatchOne(expr, names, parentenv, handlers[[1L]])"                           "12: doTryCatch(return(expr), name, parentenv, handler)"                           
[43] "11: do.call(msg$data$fun, msg$data$args)"                                          "10: (function (...) "                                                             
[45] "    BiocParallel:::.workerLapply_impl(...))(X = 1:140, FUN = function (i, "        "        Xmodel, Zmodel, Y, off.sets, randlevels, disper, genonly, "               
[47] "9: BiocParallel:::.workerLapply_impl(...)"                                         "8: do.call(lapply, args)"                                                         
[49] "7: (function (X, FUN, ...) "                                                       "   {"                                                                             
[51] "       FUN <- match.fun(FUN)"                                                      "6: FUN(X[[i]], ...)"                                                              
[53] "5: withCallingHandlers({"                                                          "       tryCatch({"                                                                
[55] "           FUN(...)"                                                               "4: tryCatch({"                                                                    
[57] "       FUN(...)"                                                                   "   }, error = handle_error)"                                                      
[59] "3: tryCatchList(expr, classes, parentenv, handlers)"                               "2: tryCatchOne(expr, names, parentenv, handlers[[1L]])"                           
[61] "1: value[[3L]](cond)" 

This shows that the traceback only goes down to the tryCatch() internal to bplapply(...) in this case.

mtmorgan commented 1 year ago

Here's a more reproducible example

Without BiocParallel

g = function() stop("stopping in 'g'")
f = function(i) if (i == 2) g()
lapply(1:4, f)
traceback()
## 4: stop("stopping in 'g'") at #1
## 3: g() at #1
## 2: FUN(X[[i]], ...)
## 1: lapply(1:4, f)

With BiocParallel, traceback() initially points to the failed iteration on the manager.

bplapply(1:4, f, BPPARAM = SnowParam())
traceback()
## 4: stop(.error_bplist(res)) at bpinit.R#54
## 3: .bpinit(manager = manager, X = X, FUN = FUN, ARGS = ARGS, BPPARAM = BPPARAM, 
##        BPOPTIONS = BPOPTIONS, BPREDO = BPREDO) at bplapply-methods.R#57
## 2: bplapply(1:4, f, BPPARAM = SnowParam()) at AllGenerics.R#2
## 1: bplapply(1:4, f, BPPARAM = SnowParam())

Using bptry() gets us the results from the worker, but it's long, complicated, in the reverse order (deepest call is last, instead of first) and uninformative because it doesn't go deeper than a tryCatch() executed at the top level of each function call (to capture remote errors and to allow for bpstopOnError).

result <- bptry(bplapply(1:4, f, BPPARAM = SnowParam()))
length(attr(result[[2]], "traceback"))
## [1] 45

attr(result[[2]], "traceback") |> tail(6) |> cat(sep = "\n")
## 4: tryCatch({
##        FUN(...)
##    }, error = handle_error)
## 3: tryCatchList(expr, classes, parentenv, handlers)
## 2: tryCatchOne(expr, names, parentenv, handlers[[1L]])
## 1: value[[3L]](cond)

Running the same command under MulticoreParam gets a long traceback (60 calls deep) but no more information. Running under SerialParam is no more informative.

I think the code here https://github.com/Bioconductor/BiocParallel/blob/43d8b2c1ba66068e620e7749b37e96742ea85a3d/R/worker.R#L199 could be reworked so that the call stack at the time the error occurs can be caught (withCallingHandlers() rather than tryCatch(), only the call stack on the worker is reported (by trimming the stack to things below the call), and in the same order as the base traceback() (by reversing the output of sys.calls() at https://github.com/Bioconductor/BiocParallel/blob/43d8b2c1ba66068e620e7749b37e96742ea85a3d/R/worker.R#L178