deephaven / deephaven-core

Deephaven Community Core
Other
255 stars 81 forks source link

R `test_local` unit tests fail on repeated runs #4165

Closed chipkent closed 1 year ago

chipkent commented 1 year ago

When test_local unit tests are rerun, they fail on the third execution.

alexpeters1208 commented 1 year ago

The R client can be made to segfault by running unit tests via test_local() three times. No other commands to run the same unit tests produce this result, and the segfault always occurs on the third run of the unit tests. This behavior is unchanged if the server is stopped and started again between runs, so long as you stay in the R session.

test_local() is a command in the testthat library that runs all of the unit tests in a local R package from the source code. Importantly, all of the tests in test_local() are run in the same R session as the caller. Conversely, test_file() is used to run the unit tests in a given testing file, and each test is run in a new R session. This method of running the tests does not exhibit the same segfault behavior. Finally, test_package() can be used to run all of the unit tests in an installed R package (compiled to .so), so long as the tests were installed along with the package. This also does not exhibit the same segfault behavior.

In the reproduction below, I have deleted test_client_options_wrapper and test_table_handle_wrapper, and only have a single test in test_client_wrapper:

test_that("client connection works in the simple case of anonymous authentication", {

    # assumes correctness of client options
    client_options <- ClientOptions$new()

    # TODO: assumes server is actually running on localhost:10000, this is probably bad for CI
    expect_no_error(client <- Client$new(target="localhost:10000", client_options=client_options))
})

So, only this testing code gets run with each invocation of test_local().

Steps to reproduce Start your R session, import the testing library, and run the local tests as follows (note that the following output only has one test run, which is a client connection):

alexpeters@alex-peters-r-dev:~/deephaven/src/deephaven/deephaven-core/R/rdeephaven$ R -d valgrind
==502165== Memcheck, a memory error detector
==502165== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==502165== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==502165== Command: /usr/lib/R/bin/exec/R
==502165== 

R version 4.3.1 (2023-06-16) -- "Beagle Scouts"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(testthat)
> test_local("/home/alexpeters/deephaven/src/deephaven/deephaven-core/R/rdeephaven")
✔ | F W S  OK | Context
⠏ |         0 | client_wrapper
==502165== Conditional jump or move depends on uninitialised value(s)
==502165==    at 0x1A9B8A57: int UnwindImpl<false, false>(void**, int*, int, int, void const*, int*) (stacktrace_x86-inl.inc:317)
==502165==    by 0x1A9B8865: Unwind<false, false> (stacktrace.cc:78)
==502165==    by 0x1A9B8865: absl::lts_20211102::GetStackTrace(void**, int, int) (stacktrace.cc:103)
==502165==    by 0x1A9B0D78: absl::lts_20211102::(anonymous namespace)::GetStack(void**, int) (mutex.cc:1318)
==502165==    by 0x1A9C6047: absl::lts_20211102::synchronization_internal::GraphCycles::UpdateStackTrace(absl::lts_20211102::synchronization_internal::GraphId, int, int (*)(void**, int)) (graphcycles.cc:679)
==502165==    by 0x1A9B0E3E: absl::lts_20211102::DeadlockCheck(absl::lts_20211102::Mutex*) (mutex.cc:1347)
==502165==    by 0x1A9B1348: absl::lts_20211102::DebugOnlyDeadlockCheck(absl::lts_20211102::Mutex*) (mutex.cc:1415)
==502165==    by 0x1A9B166A: absl::lts_20211102::Mutex::Lock() (mutex.cc:1472)
==502165==    by 0x1A93D977: gpr_mu_lock (sync_abseil.cc:56)
==502165==    by 0x1A94639D: grpc_core::(anonymous namespace)::ThreadInternalsPosix::Start() (thd_posix.cc:165)
==502165==    by 0x1A530D75: grpc_core::Thread::Start() (thd.h:132)
==502165==    by 0x1A632F8A: grpc_core::Executor::SetThreading(bool) (executor.cc:173)
==502165==    by 0x1A632ACE: grpc_core::Executor::Init() (executor.cc:97)
==502165== 
==502165== Conditional jump or move depends on uninitialised value(s)
==502165==    at 0x1A9B8F12: void** NextStackFrame<true, false>(void**, void const*, unsigned long, unsigned long) (stacktrace_x86-inl.inc:260)
==502165==    by 0x1A9B8A70: int UnwindImpl<false, false>(void**, int*, int, int, void const*, int*) (stacktrace_x86-inl.inc:322)
==502165==    by 0x1A9B8865: Unwind<false, false> (stacktrace.cc:78)
==502165==    by 0x1A9B8865: absl::lts_20211102::GetStackTrace(void**, int, int) (stacktrace.cc:103)
==502165==    by 0x1A9B0D78: absl::lts_20211102::(anonymous namespace)::GetStack(void**, int) (mutex.cc:1318)
==502165==    by 0x1A9C6047: absl::lts_20211102::synchronization_internal::GraphCycles::UpdateStackTrace(absl::lts_20211102::synchronization_internal::GraphId, int, int (*)(void**, int)) (graphcycles.cc:679)
==502165==    by 0x1A9B0E3E: absl::lts_20211102::DeadlockCheck(absl::lts_20211102::Mutex*) (mutex.cc:1347)
==502165==    by 0x1A9B1348: absl::lts_20211102::DebugOnlyDeadlockCheck(absl::lts_20211102::Mutex*) (mutex.cc:1415)
==502165==    by 0x1A9B166A: absl::lts_20211102::Mutex::Lock() (mutex.cc:1472)
==502165==    by 0x1A93D977: gpr_mu_lock (sync_abseil.cc:56)
==502165==    by 0x1A94639D: grpc_core::(anonymous namespace)::ThreadInternalsPosix::Start() (thd_posix.cc:165)
==502165==    by 0x1A530D75: grpc_core::Thread::Start() (thd.h:132)
==502165==    by 0x1A632F8A: grpc_core::Executor::SetThreading(bool) (executor.cc:173)
==502165== 
✔ |         1 | client_wrapper [6.2s]                                                                                                                                                                                                   

══ Results ══════════════════════════════════════════════════════════════════════════════════════════════════════════
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ]

> test_local("/home/alexpeters/deephaven/src/deephaven/deephaven-core/R/rdeephaven")
✔ | F W S  OK | Context
✔ |         1 | client_wrapper                                                                                                                                                                                                          

══ Results ══════════════════════════════════════════════════════════════════════════════════════════════════════════
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ]

> test_local("/home/alexpeters/deephaven/src/deephaven/deephaven-core/R/rdeephaven")
✔ | F W S  OK | Context
⠏ |         0 | client_wrapper
==502165== Conditional jump or move depends on uninitialised value(s)
==502165==    at 0x26A42A57: int UnwindImpl<false, false>(void**, int*, int, int, void const*, int*) (stacktrace_x86-inl.inc:317)
==502165==    by 0x26A42865: Unwind<false, false> (stacktrace.cc:78)
==502165==    by 0x26A42865: absl::lts_20211102::GetStackTrace(void**, int, int) (stacktrace.cc:103)
==502165==    by 0x26A3AD78: absl::lts_20211102::(anonymous namespace)::GetStack(void**, int) (mutex.cc:1318)
==502165==    by 0x26A50047: absl::lts_20211102::synchronization_internal::GraphCycles::UpdateStackTrace(absl::lts_20211102::synchronization_internal::GraphId, int, int (*)(void**, int)) (graphcycles.cc:679)
==502165==    by 0x26A3AE3E: absl::lts_20211102::DeadlockCheck(absl::lts_20211102::Mutex*) (mutex.cc:1347)
==502165==    by 0x26A3B348: absl::lts_20211102::DebugOnlyDeadlockCheck(absl::lts_20211102::Mutex*) (mutex.cc:1415)
==502165==    by 0x26A3B66A: absl::lts_20211102::Mutex::Lock() (mutex.cc:1472)
==502165==    by 0x269C7977: gpr_mu_lock (sync_abseil.cc:56)
==502165==    by 0x269D039D: grpc_core::(anonymous namespace)::ThreadInternalsPosix::Start() (thd_posix.cc:165)
==502165==    by 0x265BAD75: grpc_core::Thread::Start() (thd.h:132)
==502165==    by 0x266BCF8A: grpc_core::Executor::SetThreading(bool) (executor.cc:173)
==502165==    by 0x266BCACE: grpc_core::Executor::Init() (executor.cc:97)
==502165== 
==502165== Conditional jump or move depends on uninitialised value(s)
==502165==    at 0x26A42F12: void** NextStackFrame<true, false>(void**, void const*, unsigned long, unsigned long) (stacktrace_x86-inl.inc:260)
==502165==    by 0x26A42A70: int UnwindImpl<false, false>(void**, int*, int, int, void const*, int*) (stacktrace_x86-inl.inc:322)
==502165==    by 0x26A42865: Unwind<false, false> (stacktrace.cc:78)
==502165==    by 0x26A42865: absl::lts_20211102::GetStackTrace(void**, int, int) (stacktrace.cc:103)
==502165==    by 0x26A3AD78: absl::lts_20211102::(anonymous namespace)::GetStack(void**, int) (mutex.cc:1318)
==502165==    by 0x26A50047: absl::lts_20211102::synchronization_internal::GraphCycles::UpdateStackTrace(absl::lts_20211102::synchronization_internal::GraphId, int, int (*)(void**, int)) (graphcycles.cc:679)
==502165==    by 0x26A3AE3E: absl::lts_20211102::DeadlockCheck(absl::lts_20211102::Mutex*) (mutex.cc:1347)
==502165==    by 0x26A3B348: absl::lts_20211102::DebugOnlyDeadlockCheck(absl::lts_20211102::Mutex*) (mutex.cc:1415)
==502165==    by 0x26A3B66A: absl::lts_20211102::Mutex::Lock() (mutex.cc:1472)
==502165==    by 0x269C7977: gpr_mu_lock (sync_abseil.cc:56)
==502165==    by 0x269D039D: grpc_core::(anonymous namespace)::ThreadInternalsPosix::Start() (thd_posix.cc:165)
==502165==    by 0x265BAD75: grpc_core::Thread::Start() (thd.h:132)
==502165==    by 0x266BCF8A: grpc_core::Executor::SetThreading(bool) (executor.cc:173)
==502165== 
==502165== Invalid read of size 4
==502165==    at 0x2663C3BF: grpc_core::ExecCtx::starting_cpu() (exec_ctx.h:129)
==502165==    by 0x266B711F: grpc_core::CallCombiner::Start(grpc_closure*, grpc_error*, char const*, int, char const*) (call_combiner.cc:139)
==502165==    by 0x267491D2: grpc_core::FilterStackCall::ExecuteBatch(grpc_transport_stream_op_batch*, grpc_closure*) (call.cc:729)
==502165==    by 0x2674C90A: grpc_core::FilterStackCall::StartBatch(grpc_op const*, unsigned long, void*, bool) (call.cc:1755)
==502165==    by 0x1A6C30FA: grpc_call_start_batch (call.cc:1894)
==502165==    by 0x1A531B74: grpc::CoreCodegen::grpc_call_start_batch(grpc_call*, grpc_op const*, unsigned long, void*, void*) (core_codegen.cc:113)
==502165==    by 0x259E855A: grpc::internal::CallOpSet<grpc::internal::CallOpSendInitialMetadata, grpc::internal::CallOpSendMessage, grpc::internal::CallOpRecvInitialMetadata, grpc::internal::CallOpRecvMessage<google::protobuf::MessageLite>, grpc::internal::CallOpClientSendClose, grpc::internal::CallOpClientRecvStatus>::ContinueFillOpsAfterInterception() (call_op_set.h:976)
==502165==    by 0x259E82D6: grpc::internal::CallOpSet<grpc::internal::CallOpSendInitialMetadata, grpc::internal::CallOpSendMessage, grpc::internal::CallOpRecvInitialMetadata, grpc::internal::CallOpRecvMessage<google::protobuf::MessageLite>, grpc::internal::CallOpClientSendClose, grpc::internal::CallOpClientRecvStatus>::FillOps(grpc::internal::Call*) (call_op_set.h:906)
==502165==    by 0x265C3263: grpc::Channel::PerformOpsOnCall(grpc::internal::CallOpSetInterface*, grpc::internal::Call*) (channel_cc.cc:166)
==502165==    by 0x259C9934: grpc::internal::Call::PerformOps(grpc::internal::CallOpSetInterface*) (call.h:68)
==502165==    by 0x259E3B48: grpc::internal::BlockingUnaryCallImpl<google::protobuf::MessageLite, google::protobuf::MessageLite>::BlockingUnaryCallImpl(grpc::ChannelInterface*, grpc::internal::RpcMethod const&, grpc::ClientContext*, google::protobuf::MessageLite const&, google::protobuf::MessageLite*) (client_unary_call.h:82)
==502165==    by 0x259E2F4B: grpc::Status grpc::internal::BlockingUnaryCall<io::deephaven::proto::backplane::grpc::ConfigurationConstantsRequest, io::deephaven::proto::backplane::grpc::ConfigurationConstantsResponse, google::protobuf::MessageLite, google::protobuf::MessageLite>(grpc::ChannelInterface*, grpc::internal::RpcMethod const&, grpc::ClientContext*, io::deephaven::proto::backplane::grpc::ConfigurationConstantsRequest const&, io::deephaven::proto::backplane::grpc::ConfigurationConstantsResponse*) (in /tmp/RtmpDZIUHx/pkgload7a9951026c2bc/rdeephaven.so)
==502165==  Address 0x30 is not stack'd, malloc'd or (recently) free'd
==502165== 

 *** caught segfault ***
address 0x30, cause 'memory not mapped'

Traceback:
 1: new_CppObject_xp(fields$.module, fields$.pointer, ...)
 2: Rcpp::cpp_object_initializer(.self, .refClassDef, ...)
 3: .Object$initialize(...)
 4: initialize(value, ...)
 5: initialize(value, ...)
 6: new(INTERNAL_Client, target = target, client_options = client_options$internal_client_options)
 7: initialize(...)
 8: Client$new(target = "localhost:10000", client_options = client_options)
 9: eval_bare(quo_get_expr(.quo), quo_get_env(.quo))
10: withCallingHandlers(expr, condition = function(cnd) {    {        .__handler_frame__. <- TRUE        .__setup_frame__. <- frame        if (inherits(cnd, "message")) {            except <- c("warning", "error")        }        else if (inherits(cnd, "warning")) {            except <- "error"        }        else {            except <- ""        }    }    while (!is_null(cnd)) {        if (inherits(cnd, "error")) {            out <- handlers[[1L]](cnd)            if (!inherits(out, "rlang_zap"))                 throw(out)        }        inherit <- .subset2(.subset2(cnd, "rlang"), "inherit")        if (is_false(inherit)) {            return()        }        cnd <- .subset2(cnd, "parent")    }})
11: doTryCatch(return(expr), name, parentenv, handler)
12: tryCatchOne(expr, names, parentenv, handlers[[1L]])
13: tryCatchList(expr, classes, parentenv, handlers)
14: tryCatch(withCallingHandlers(expr, condition = function(cnd) {    {        .__handler_frame__. <- TRUE        .__setup_frame__. <- frame        if (inherits(cnd, "message")) {            except <- c("warning", "error")        }        else if (inherits(cnd, "warning")) {            except <- "error"        }        else {            except <- ""        }    }    while (!is_null(cnd)) {        if (inherits(cnd, "error")) {            out <- handlers[[1L]](cnd)            if (!inherits(out, "rlang_zap"))                 throw(out)        }        inherit <- .subset2(.subset2(cnd, "rlang"), "inherit")        if (is_false(inherit)) {            return()        }        cnd <- .subset2(cnd, "parent")    }}), stackOverflowError = handlers[[1L]])
15: try_fetch(code, `:=`(!!base_class, function(cnd) {    if (!matcher(cnd)) {        return(zap())    }    expected <- paste0("Expected ", quo_label(enquo(object)),         " to run without any ", base_class, "s", if (!is.null(class))             paste0(" of class '", class, "'"), if (!is.null(regexp))             paste0(" matching pattern '", regexp, "'"), ".")    actual <- paste0("Actually got a <", class(cnd)[[1]], "> with text:\n",         indent_lines(rlang::cnd_message(cnd)))    message <- format_error_bullets(c(expected, i = actual))    fail(message, trace_env = error_call)}))
16: .capture(act$val <- eval_bare(quo_get_expr(.quo), quo_get_env(.quo)),     ...)
17: quasi_capture(enquo(object), NULL, capture)
18: expect_no_("error", {    {        object    }}, regexp = message, class = class)
19: expect_no_error(client <- Client$new(target = "localhost:10000",     client_options = client_options))
20: eval(code, test_env)
21: eval(code, test_env)
22: withCallingHandlers({    eval(code, test_env)    if (!handled && !is.null(test)) {        skip_empty()    }}, expectation = handle_expectation, skip = handle_skip, warning = handle_warning,     message = handle_message, error = handle_error)
23: doTryCatch(return(expr), name, parentenv, handler)
24: tryCatchOne(expr, names, parentenv, handlers[[1L]])
25: tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
26: doTryCatch(return(expr), name, parentenv, handler)
27: tryCatchOne(tryCatchList(expr, names[-nh], parentenv, handlers[-nh]),     names[nh], parentenv, handlers[[nh]])
28: tryCatchList(expr, classes, parentenv, handlers)
29: tryCatch(withCallingHandlers({    eval(code, test_env)    if (!handled && !is.null(test)) {        skip_empty()    }}, expectation = handle_expectation, skip = handle_skip, warning = handle_warning,     message = handle_message, error = handle_error), error = handle_fatal,     skip = function(e) {    })
30: test_code(desc, code, env = parent.frame(), default_reporter = local_interactive_reporter())
31: test_that("client connection works in the simple case of anonymous authentication",     {        client_options <- ClientOptions$new()        expect_no_error(client <- Client$new(target = "localhost:10000",             client_options = client_options))    })
32: eval(code, test_env)
33: eval(code, test_env)
34: withCallingHandlers({    eval(code, test_env)    if (!handled && !is.null(test)) {        skip_empty()    }}, expectation = handle_expectation, skip = handle_skip, warning = handle_warning,     message = handle_message, error = handle_error)
35: doTryCatch(return(expr), name, parentenv, handler)
36: tryCatchOne(expr, names, parentenv, handlers[[1L]])
37: tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
38: doTryCatch(return(expr), name, parentenv, handler)
39: tryCatchOne(tryCatchList(expr, names[-nh], parentenv, handlers[-nh]),     names[nh], parentenv, handlers[[nh]])
40: tryCatchList(expr, classes, parentenv, handlers)
41: tryCatch(withCallingHandlers({    eval(code, test_env)    if (!handled && !is.null(test)) {        skip_empty()    }}, expectation = handle_expectation, skip = handle_skip, warning = handle_warning,     message = handle_message, error = handle_error), error = handle_fatal,     skip = function(e) {    })
42: test_code(test = NULL, code = exprs, env = env, default_reporter = StopReporter$new())
43: source_file(path, env(env), wrap = wrap)
44: FUN(X[[i]], ...)
45: lapply(test_paths, test_one_file, env = env, wrap = wrap)
46: doTryCatch(return(expr), name, parentenv, handler)
47: tryCatchOne(expr, names, parentenv, handlers[[1L]])
48: tryCatchList(expr, classes, parentenv, handlers)
49: tryCatch(code, testthat_abort_reporter = function(cnd) {    cat(conditionMessage(cnd), "\n")    NULL})
50: with_reporter(reporters$multi, lapply(test_paths, test_one_file,     env = env, wrap = wrap))
51: test_files_serial(test_dir = test_dir, test_package = test_package,     test_paths = test_paths, load_helpers = load_helpers, reporter = reporter,     env = env, stop_on_failure = stop_on_failure, stop_on_warning = stop_on_warning,     wrap = wrap, load_package = load_package)
52: test_files(test_dir = path, test_paths = test_paths, test_package = package,     reporter = reporter, load_helpers = load_helpers, env = env,     stop_on_failure = stop_on_failure, stop_on_warning = stop_on_warning,     wrap = wrap, load_package = load_package, parallel = parallel)
53: test_dir(test_path, package = package, reporter = reporter, ...,     load_package = load_package)
54: test_local("/home/alexpeters/deephaven/src/deephaven/deephaven-core/R/rdeephaven")

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

Note that the relevant segfault memory address 0x30 is always the address of the segfault, which is too low in memory to be the address of a pointer.

Debugging

Here is a trace from GDB:

(gdb) bt
#0  0x00007fffc6a33e0b in grpc_core::ExecCtx::starting_cpu (this=0x0) at /home/alexpeters/deephaven/dhcpp/src/grpc/src/core/lib/iomgr/exec_ctx.h:129
#1  0x00007fffc6aae984 in grpc_core::CallCombiner::Start (this=0x555558e09070, closure=0x555558e0a160, error=0x0, file=0x7fffc74a3a28 "/home/alexpeters/deephaven/dhcpp/src/grpc/src/core/lib/surface/call.cc", line=729, 
    reason=0x7fffc74a3cb2 "executing batch") at /home/alexpeters/deephaven/dhcpp/src/grpc/src/core/lib/iomgr/call_combiner.cc:139
#2  0x00007fffc6b40a37 in grpc_core::FilterStackCall::ExecuteBatch (this=0x555558e09030, batch=0x555558e0a088, start_batch_closure=0x555558e0a160) at /home/alexpeters/deephaven/dhcpp/src/grpc/src/core/lib/surface/call.cc:729
#3  0x00007fffc6b4416f in grpc_core::FilterStackCall::StartBatch (this=0x555558e09030, ops=0x7ffffff24880, nops=6, notify_tag=0x7ffffff24bc0, is_notify_tag_closure=false)
    at /home/alexpeters/deephaven/dhcpp/src/grpc/src/core/lib/surface/call.cc:1755
#4  0x00007fffeba4495f in grpc_call_start_batch (call=0x555558e09030, ops=0x7ffffff24880, nops=6, tag=0x7ffffff24bc0, reserved=0x0) at /home/alexpeters/deephaven/dhcpp/src/grpc/src/core/lib/surface/call.cc:1894
#5  0x00007fffeb8b3641 in grpc::CoreCodegen::grpc_call_start_batch (this=0x55555a708280, call=0x555558e09030, ops=0x7ffffff24880, nops=6, tag=0x7ffffff24bc0, reserved=0x0)
    at /home/alexpeters/deephaven/dhcpp/src/grpc/src/cpp/common/core_codegen.cc:113
#6  0x00007fffc5df0124 in grpc::internal::CallOpSet<grpc::internal::CallOpSendInitialMetadata, grpc::internal::CallOpSendMessage, grpc::internal::CallOpRecvInitialMetadata, grpc::internal::CallOpRecvMessage<google::protobuf::MessageLite>, grpc::internal::CallOpClientSendClose, grpc::internal::CallOpClientRecvStatus>::ContinueFillOpsAfterInterception() () from /tmp/RtmpFVMZTV/pkgloada622e4844a428/rdeephaven.so
#7  0x00007fffc5defebb in grpc::internal::CallOpSet<grpc::internal::CallOpSendInitialMetadata, grpc::internal::CallOpSendMessage, grpc::internal::CallOpRecvInitialMetadata, grpc::internal::CallOpRecvMessage<google::protobuf::MessageLite>, grpc::internal::CallOpClientSendClose, grpc::internal::CallOpClientRecvStatus>::FillOps(grpc::internal::Call*) () from /tmp/RtmpFVMZTV/pkgloada622e4844a428/rdeephaven.so
#8  0x00007fffc69bacd8 in grpc::Channel::PerformOpsOnCall (this=0x55555db96dc0, ops=0x7ffffff24bc0, call=0x7ffffff24b50) at /home/alexpeters/deephaven/dhcpp/src/grpc/src/cpp/client/channel_cc.cc:166
#9  0x00007fffc5dd34d9 in grpc::internal::Call::PerformOps (this=0x7ffffff24b50, ops=0x7ffffff24bc0) at /home/alexpeters/deephaven/dhcpp/src/grpc/include/grpcpp/impl/codegen/call.h:68
#10 0x00007fffc5deb6e8 in grpc::internal::BlockingUnaryCallImpl<google::protobuf::MessageLite, google::protobuf::MessageLite>::BlockingUnaryCallImpl(grpc::ChannelInterface*, grpc::internal::RpcMethod const&, grpc::ClientContext*, google::protobuf::MessageLite const&, google::protobuf::MessageLite*) () from /tmp/RtmpFVMZTV/pkgloada622e4844a428/rdeephaven.so
#11 0x00007fffc5dead37 in grpc::Status grpc::internal::BlockingUnaryCall<io::deephaven::proto::backplane::grpc::ConfigurationConstantsRequest, io::deephaven::proto::backplane::grpc::ConfigurationConstantsResponse, google::protobuf::--Type <RET> for more, q to quit, c to continue without paging--ret
MessageLite, google::protobuf::MessageLite>(grpc::ChannelInterface*, grpc::internal::RpcMethod const&, grpc::ClientContext*, io::deephaven::proto::backplane::grpc::ConfigurationConstantsRequest const&, io::deephaven::proto::backplane::grpc::ConfigurationConstantsResponse*) () from /tmp/RtmpFVMZTV/pkgloada622e4844a428/rdeephaven.so
#12 0x00007fffc5de8758 in io::deephaven::proto::backplane::grpc::ConfigService::Stub::GetConfigurationConstants(grpc::ClientContext*, io::deephaven::proto::backplane::grpc::ConfigurationConstantsRequest const&, io::deephaven::proto::backplane::grpc::ConfigurationConstantsResponse*) () from /tmp/RtmpFVMZTV/pkgloada622e4844a428/rdeephaven.so
#13 0x00007fffc5d75620 in deephaven::client::server::Server::createFromTarget(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, deephaven::client::ClientOptions const&) ()
   from /tmp/RtmpFVMZTV/pkgloada622e4844a428/rdeephaven.so
#14 0x00007fffc5cd6cf6 in deephaven::client::Client::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, deephaven::client::ClientOptions const&) ()
   from /tmp/RtmpFVMZTV/pkgloada622e4844a428/rdeephaven.so
#15 0x00007fffc5cb7369 in newClientWrapper (target=..., client_options=...) at client.cpp:237
#16 0x00007fffc5ccceb2 in Rcpp::Factory_2<ClientWrapper, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ClientOptionsWrapper const&>::get_new (nargs=-894688, args=0x7ffffff259a0, 
    this=<optimized out>) at /home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/Rcpp/include/Rcpp/module/Module_generated_Factory.h:66
#17 Rcpp::class_<ClientWrapper>::newInstance (this=<optimized out>, args=0x7ffffff259a0, nargs=-894688) at /home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/Rcpp/include/Rcpp/module/class.h:142
#18 0x00007fffef5f9efa in class__newInstance (args=<optimized out>) at module.cpp:143

This points a finger at gRPC's exec_ctx class, which you can find here. This class is "a bag of data that collects information along a callstack. It is created on the stack at core entry points (public API or iomgr), and stored internally as a thread-local variable." This class is special, and the fact that the error is occurring here means that it is occurring very early on in the Client creation process, with the call to client <- Client$new(...). In fact, this error is occurring before the call to create a client gets to the Deephaven server, indicating that the server cannot be at fault. This is verifiable by reproducing this issue without a running Deephaven server, which will still exhibit the same problem.

The segfault is occurring at line 129 of exec_ctx.h in the ExecCtx class with this code:

if (starting_cpu_ == std::numeric_limits<unsigned>::max()) {...

The attempt to read starting_cpu_ results in the segfault. This is consistent with starting_cpu_ being null, and further inspection revealed that just before the crash, this (with respect to ExecCtx) had a value of 0. So, at some point in the creation of ExecCtx, we had a null pointer where we should not have.

To explore this, we added some printfs to grpc_call_start_batch, the function responsible for initializing ExecCtx, and created a new global function to track the address of something known to be constant, called global_func. Here are the results:

> library("testthat")
> test_local(".")
...
from grpc_call_start_batch, global_func has address 0x7fffea7b3eda
inside StartBatch, global_func has address 0x7fffea7b3eda
...

> test_local(".")
from grpc_call_start_batch, global_func has address 0x7fffea7b3eda
inside StartBatch, global_func has address 0x7fffea7b3eda

> test_local(".")
from grpc_call_start_batch, global_func has address 0x7fffea7b3eda
inside StartBatch, global_func has address 0x7fffc4fb3eda

From this, we can clearly see that on the third call to test_local(), the address of a global function has changed from underneath is, which could very well help explain the null ExecCtx.

We suspect that this is due to grpc getting loaded twice, which appears to be a result of rdeephaven.so being loaded multiple times. We can confirm that rdeephaven.so is indeed getting loaded multiple times by running R with ltrace, which is a Linux command to intercept and record dynamic library calls:

alexpeters@alex-peters-r-dev:~/deephaven/src/deephaven/deephaven-core/R/rdeephaven$ R -d "ltrace -e dlopen+dlclose -s 128"
libR.so->dlopen("/usr/lib/R/library/methods/libs/methods.so", 2)                                                                                = 0x55f435b2cfe0

R version 4.3.1 (2023-06-16) -- "Beagle Scouts"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

libR.so->dlopen("/usr/lib/R/library/utils/libs/utils.so", 2)                                                                                    = 0x55f436e19370
--- SIGCHLD (Child exited) ---
--- SIGCHLD (Child exited) ---
libR.so->dlopen("/usr/lib/R/library/grDevices/libs/grDevices.so", 2)                                                                            = 0x55f435f25e30
libR.so->dlopen("/usr/lib/R/library/graphics/libs/graphics.so", 2)                                                                              = 0x55f436361ec0
libR.so->dlopen("/usr/lib/R/library/stats/libs/stats.so", 2)                                                                                    = 0x55f4360ecac0
> library(testthat)
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/rlang/libs/rlang.so", 2)                                                    = 0x55f4379912f0
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/cli/libs/cli.so", 2)                                                        = 0x55f437ffa220
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/brio/libs/brio.so", 2)                                                      = 0x55f437754e90
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/magrittr/libs/magrittr.so", 2)                                              = 0x55f43605a0d0
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/testthat/libs/testthat.so", 2)                                              = 0x55f438044d80
> test_local("/home/alexpeters/deephaven/src/deephaven/deephaven-core/R/rdeephaven")
--- SIGCHLD (Child exited) ---
--- SIGCHLD (Child exited) ---
libR.so->dlopen("/usr/lib/R/modules//lapack.so", 2)                                                                                             = 0x55f43782b050
--- SIGCHLD (Child exited) ---
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/processx/libs/processx.so", 2)                                              = 0x55f43736bf30
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/ps/libs/ps.so", 2)                                                          = 0x55f43639e490
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/Rcpp/libs/Rcpp.so", 2)                                                      = 0x55f437a5c9b0
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/bit/libs/bit.so", 2)                                                        = 0x55f43a34e770
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/bit64/libs/bit64.so", 2)                                                    = 0x55f43a8bd0e0
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/glue/libs/glue.so", 2)                                                      = 0x55f43aba40c0
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/vctrs/libs/vctrs.so", 2)                                                    = 0x55f43ab334c0
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/purrr/libs/purrr.so", 2)                                                    = 0x55f43a631850
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/arrow/libs/arrow.so", 2)                                                    = 0x55f437b73980
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/fansi/libs/fansi.so", 2)                                                    = 0x55f43b4be990
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/utf8/libs/utf8.so", 2)                                                      = 0x55f43b4f4da0
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/tibble/libs/tibble.so", 2)                                                  = 0x55f43a5addd0
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/dplyr/libs/dplyr.so", 2)                                                    = 0x55f438a53530
libR.so->dlopen("/usr/lib/R/library/tools/libs/tools.so", 2)                                                                                    = 0x55f4383ab8c0
libR.so->dlopen("/tmp/RtmpYYgsuy/pkgloadb8f2e5cc64599/rdeephaven.so", 2)                                                                        = 0x55f43b545220
--- SIGCHLD (Child exited) ---
✔ | F W S  OK | Context
✔ |         1 | client_wrapper                                                                                                                                                                                                          

══ Results ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ]
> test_local("/home/alexpeters/deephaven/src/deephaven/deephaven-core/R/rdeephaven")
void deephaven::client::utility::Executor::shutdown()@/home/alexpeters/deephaven/src/deephaven/deephaven-core/cpp-client/deephaven/client/src/utility/executor.cc:26: Executor 'Client executor' shutdown requested
static void deephaven::client::utility::Executor::threadStart(std::shared_ptr<deephaven::client::utility::Executor>)@/home/alexpeters/deephaven/src/deephaven/deephaven-core/cpp-client/deephaven/client/src/utility/executor.cc:58: Executor 'Client executor' thread exiting
void deephaven::client::utility::Executor::shutdown()@/home/alexpeters/deephaven/src/deephaven/deephaven-core/cpp-client/deephaven/client/src/utility/executor.cc:26: Executor 'Flight executor' shutdown requested
static void deephaven::client::utility::Executor::threadStart(std::shared_ptr<deephaven::client::utility::Executor>)@/home/alexpeters/deephaven/src/deephaven/deephaven-core/cpp-client/deephaven/client/src/utility/executor.cc:58: Executor 'Flight executor' thread exiting
void deephaven::client::server::Server::shutdown()@/home/alexpeters/deephaven/src/deephaven/deephaven-core/cpp-client/deephaven/client/src/server/server.cc:239: Server shutdown requested
static void deephaven::client::server::Server::sendKeepaliveMessages(const std::shared_ptr<deephaven::client::server::Server>&)@/home/alexpeters/deephaven/src/deephaven/deephaven-core/cpp-client/deephaven/client/src/server/server.cc:610: Keepalive thread exiting
static void deephaven::client::server::Server::processCompletionQueueForever(const std::shared_ptr<deephaven::client::server::Server>&)@/home/alexpeters/deephaven/src/deephaven/deephaven-core/cpp-client/deephaven/client/src/server/server.cc:529: Process completion queue thread exiting
libR.so->dlopen("/tmp/RtmpYYgsuy/pkgloadb8f2e77dafac/rdeephaven.so", 2)                                                                         = 0x55f4363eca80
✔ | F W S  OK | Context
✔ |         1 | client_wrapper                                                                                                                                                                                                          

══ Results ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ]
> test_local("/home/alexpeters/deephaven/src/deephaven/deephaven-core/R/rdeephaven")
void deephaven::client::utility::Executor::shutdown()@/home/alexpeters/deephaven/src/deephaven/deephaven-core/cpp-client/deephaven/client/src/utility/executor.cc:26: Executor 'Client executor' shutdown requested
static void deephaven::client::utility::Executor::threadStart(std::shared_ptr<deephaven::client::utility::Executor>)@/home/alexpeters/deephaven/src/deephaven/deephaven-core/cpp-client/deephaven/client/src/utility/executor.cc:58: Executor 'Client executor' thread exiting
void deephaven::client::utility::Executor::shutdown()@/home/alexpeters/deephaven/src/deephaven/deephaven-core/cpp-client/deephaven/client/src/utility/executor.cc:26: Executor 'Flight executor' shutdown requested
static void deephaven::client::utility::Executor::threadStart(std::shared_ptr<deephaven::client::utility::Executor>)@/home/alexpeters/deephaven/src/deephaven/deephaven-core/cpp-client/deephaven/client/src/utility/executor.cc:58: Executor 'Flight executor' thread exiting
void deephaven::client::server::Server::shutdown()@/home/alexpeters/deephaven/src/deephaven/deephaven-core/cpp-client/deephaven/client/src/server/server.cc:239: Server shutdown requested
static void deephaven::client::server::Server::processCompletionQueueForever(const std::shared_ptr<deephaven::client::server::Server>&)@/home/alexpeters/deephaven/src/deephaven/deephaven-core/cpp-client/deephaven/client/src/server/server.cc:529: Process completion queue thread exiting
static void deephaven::client::server::Server::sendKeepaliveMessages(const std::shared_ptr<deephaven::client::server::Server>&)@/home/alexpeters/deephaven/src/deephaven/deephaven-core/cpp-client/deephaven/client/src/server/server.cc:610: Keepalive thread exiting
libR.so->dlopen("/tmp/RtmpYYgsuy/pkgloadb8f2e1420a56d/rdeephaven.so", 2)                                                                        = 0x55f43b52cb50
✔ | F W S  OK | Context
⠏ |         0 | client_wrapper                                                                                                                                                                                                          --- SIGSEGV (Segmentation fault) ---

 *** caught segfault ***
address 0x30, cause 'memory not mapped'

Here, you can see a different version of rdeephaven.so being loaded every time, due to test_local() creating temporary installs from the package source code to run its tests. This behavior is not replicated with test_package(), which is shown to only load rdeephaven.so once:

alexpeters@alex-peters-r-dev:~/deephaven/src/deephaven/deephaven-core/R/rdeephaven$ R -d "ltrace -e dlopen+dlclose -s 128"
libR.so->dlopen("/usr/lib/R/library/methods/libs/methods.so", 2)                                                                                = 0x556131908fc0

R version 4.3.1 (2023-06-16) -- "Beagle Scouts"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

libR.so->dlopen("/usr/lib/R/library/utils/libs/utils.so", 2)                                                                                    = 0x556132bf5340
--- SIGCHLD (Child exited) ---
--- SIGCHLD (Child exited) ---
libR.so->dlopen("/usr/lib/R/library/grDevices/libs/grDevices.so", 2)                                                                            = 0x556131d01e00
libR.so->dlopen("/usr/lib/R/library/graphics/libs/graphics.so", 2)                                                                              = 0x55613213de90
libR.so->dlopen("/usr/lib/R/library/stats/libs/stats.so", 2)                                                                                    = 0x556131ec8a90
> library(testthat)
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/rlang/libs/rlang.so", 2)                                                    = 0x55613376d2d0
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/cli/libs/cli.so", 2)                                                        = 0x556133dd61f0
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/brio/libs/brio.so", 2)                                                      = 0x556133531020
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/magrittr/libs/magrittr.so", 2)                                              = 0x556131e360a0
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/testthat/libs/testthat.so", 2)                                              = 0x556133e20d50
> test_package("rdeephaven")
--- SIGCHLD (Child exited) ---
--- SIGCHLD (Child exited) ---
libR.so->dlopen("/usr/lib/R/modules//lapack.so", 2)                                                                                             = 0x5561322f6d90
--- SIGCHLD (Child exited) ---
Loading required package: Rcpp
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/Rcpp/libs/Rcpp.so", 2)                                                      = 0x556133bba7b0
Loading required package: arrow
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/bit/libs/bit.so", 2)                                                        = 0x55613421f790
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/bit64/libs/bit64.so", 2)                                                    = 0x556133c0c2b0
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/glue/libs/glue.so", 2)                                                      = 0x5561320f5580
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/vctrs/libs/vctrs.so", 2)                                                    = 0x556133863f30
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/purrr/libs/purrr.so", 2)                                                    = 0x556131e7a480
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/arrow/libs/arrow.so", 2)                                                    = 0x5561317f8ec0
Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.

Attaching package: ‘arrow’

The following object is masked from ‘package:testthat’:

    matches

The following object is masked from ‘package:utils’:

    timestamp

Loading required package: R6
Loading required package: dplyr
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/fansi/libs/fansi.so", 2)                                                    = 0x556135de4660
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/utf8/libs/utf8.so", 2)                                                      = 0x556135eca110
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/tibble/libs/tibble.so", 2)                                                  = 0x5561368e17b0
libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/dplyr/libs/dplyr.so", 2)                                                    = 0x5561341c68e0

Attaching package: ‘dplyr’

The following object is masked from ‘package:testthat’:

    matches

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

libR.so->dlopen("/home/alexpeters/R/x86_64-pc-linux-gnu-library/4.3/rdeephaven/libs/rdeephaven.so", 2)                                          = 0x556134cecd80
--- SIGCHLD (Child exited) ---
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ]
libR.so->dlopen("/usr/lib/R/library/tools/libs/tools.so", 2)                                                                                    = 0x556131302580
void deephaven::client::utility::Executor::shutdown()@/home/alexpeters/deephaven/src/deephaven/deephaven-core/cpp-client/deephaven/client/src/utility/executor.cc:26: Executor 'Client executor' shutdown requested
static void deephaven::client::utility::Executor::threadStart(std::shared_ptr<deephaven::client::utility::Executor>)@/home/alexpeters/deephaven/src/deephaven/deephaven-core/cpp-client/deephaven/client/src/utility/executor.cc:58: Executor 'Client executor' thread exiting
void deephaven::client::utility::Executor::shutdown()@/home/alexpeters/deephaven/src/deephaven/deephaven-core/cpp-client/deephaven/client/src/utility/executor.cc:26: Executor 'Flight executor' shutdown requested
static void deephaven::client::utility::Executor::threadStart(std::shared_ptr<deephaven::client::utility::Executor>)@/home/alexpeters/deephaven/src/deephaven/deephaven-core/cpp-client/deephaven/client/src/utility/executor.cc:58: Executor 'Flight executor' thread exiting
void deephaven::client::server::Server::shutdown()@/home/alexpeters/deephaven/src/deephaven/deephaven-core/cpp-client/deephaven/client/src/server/server.cc:239: Server shutdown requested
static void deephaven::client::server::Server::processCompletionQueueForever(const std::shared_ptr<deephaven::client::server::Server>&)@/home/alexpeters/deephaven/src/deephaven/deephaven-core/cpp-client/deephaven/client/src/server/server.cc:529: Process completion queue thread exiting
static void deephaven::client::server::Server::sendKeepaliveMessages(const std::shared_ptr<deephaven::client::server::Server>&)@/home/alexpeters/deephaven/src/deephaven/deephaven-core/cpp-client/deephaven/client/src/server/server.cc:610: Keepalive thread exiting
> test_package("rdeephaven")
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ]
> test_package("rdeephaven")
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ]

Based on the available data, it seems reasonable to conclude that this bug is a result of test_local() loading new installs of rdeephaven.so with each invocation, which in turn is inappropriately loading multiple instances of gRPC. There is no planned fix for this, and it seems that the best advice is to not use test_local().

jcferretti commented 1 year ago

For the record one thing I investigated yesterday was how R arrow package manages the grpc dependency. I was curious if we were 'unique') in doing one mega rdeephave.so shares object with all our dependencies riding along in one single file, and maybe the standard thing being packages just having other shared objects outside as dependencies (eg, we don't build stdc++ in, we depend on that from the environment, and test_local does not manually dlopen those again.

Turns out arrow does the same with its dependencies (eg, grpc) than we do

It does use some different way to create the wrapping tho, ie, it does not depend on Rcpp. But thats a different difference. In terms of shared objects it does the same.