Closed mostynb closed 2 months ago
HI @mostynb do you have the arguments that were passed to reproxy
(should be in reproxy.INFO
log file) ? Also is this failing in rewrapper
or reproxy
?
I don't think reproxy crashed, dumpstats successfully stopped reproxy at the end of the build, after the failure.
A tool similar to ninja ran a command of the form rewrapper.sh clang++ <flags> foo.mm
and reported that the exit code was 2 and the console displayed the stack trace above and nothing else. That bash script ran something of the form exec rewrapper -exec_strategy local --labels=type=compile,lang=cpp,compiler=clang -log_dir <log dir> -server_address unix:///tmp/reproxy.sock -dial_timeout 5s <the clang++ compile command>
Crashing inside the generated REAPI bindings init function is kind of unexpected. I wonder if this might be memory corruption :/
reproxy was started with something like this:
exec reproxy -instance foo -server_address unix:///tmp/reproxy.sock -service <cache server address> -rpc_timeouts GetActionResult=10s,default=30s --service_no_security -service_no_auth -proxy_log_dir <log dir> -log_dir <log dir> -compression_threshold -1
The flags mentioned in the reproxy.INFO file:
Command line flags:
--alsologtostderr=false \
--auxiliary_metadata_path= \
--cache_dir= \
--cache_silo= \
--cas_concurrency=500 \
--cas_service= \
--cfg= \
--clang_depscan_archive=false \
--clang_depscan_ignored_plugins= \
--clean_include_paths=false \
--compression_threshold=1 \
--cpp_dependency_scanner_plugin= \
--credential_file= \
--creds_file= \
--deps_cache_max_mb=128 \
--depsscanner_address=execrel:// \
--download_buffer_size=10000 \
--download_tick_duration=50ms \
--download_tmp_dir= \
--dump_input_tree=false \
--enable_creds_cache=true \
--enable_deps_cache=false \
--experimental_cache_miss_rate=0 \
--experimental_credentials_helper= \
--experimental_credentials_helper_args= \
--experimental_exit_on_stuck_actions=false \
--experimental_goma_deps_cache=false \
--experimental_sysroot_do_not_upload=false \
--fail_early_min_action_count=0 \
--fail_early_min_fallback_ratio=0 \
--fail_early_window=0s \
--gcert_refresh_timeout=0 \
--grpc_keepalive_permit_without_stream=false \
--grpc_keepalive_time=0s \
--grpc_keepalive_timeout=20s \
--instance=foo \
--ip_reset_min_delay=3m0s \
--ip_timeout=10m0s \
--local_resource_fraction=1 \
--log_backtrace_at= \
--log_dir=. \
--log_format=reducedtext \
--log_http_calls=false \
--log_keep_duration=24h0m0s \
--log_link= \
--log_path= \
--logbuflevel=0 \
--logtostderr=false \
--max_concurrent_requests_per_conn=25 \
--max_concurrent_streams_per_conn=25 \
--max_listen_size_kb=8192 \
--metrics_labels= \
--metrics_namespace= \
--metrics_prefix= \
--metrics_project= \
--min_grpc_connections=5 \
--mismatch_ignore_config_path= \
--num_records_to_keep=0 \
--pprof_file= \
--pprof_mem_file= \
--pprof_port=0 \
--profiler_project_id= \
--profiler_service= \
--proxy_idle_timeout=6h0m0s \
--proxy_log_dir=. \
--racing_bias=0.75 \
--racing_tmp_dir= \
--remote_disabled=false \
--round_robin_balancer_pool_size=25 \
--rpc_timeouts=GetActionResult=10s,default=30s \
--server_address=unix:///tmp/reproxy.sock \
--service=<SERVER ADDRESS WAS HERE> \
--service_no_auth=true \
--service_no_security=true \
--shadow_header_detection=false \
--startup_capabilities=true \
--stderrthreshold=2 \
--tls_ca_cert= \
--tls_client_auth_cert= \
--tls_client_auth_key= \
--tls_server_name= \
--upload_buffer_size=10000 \
--upload_tick_duration=50ms \
--use_application_default_credentials=false \
--use_batches=true \
--use_external_auth_token=false \
--use_gce_credentials=false \
--use_gcloud_creds=false \
--use_google_prod_creds=false \
--use_round_robin_balancer=true \
--use_rpc_credentials=true \
--use_unified_cas_ops=false \
--use_unified_downloads=false \
--use_unified_uploads=false \
--v=0 \
--version=false \
--version_cache_silo=false \
--version_sdk=false \
--vmodule= \
--wait_for_shutdown_rpc=false \
--xattr_digest=
golang.org/protobuf/reflect/protoregistry
become our transparent dependency since version 0.142, but we are not explicitly calling golang.org/protobuf/reflect/protoregistry.(*Files).RegisterFile
in our code, the crash in comment#1 was crashed at this line inside of RegisterFile()
comment#3 confirmed that reproxy got command line flags: --auxiliary_metadata_path=
, this empty value should make reproxy skips all the proto reflection related logic: https://github.com/bazelbuild/reclient/blob/285f5247c7455f81ca8964874fd4bc5822c921b2/cmd/reproxy/main.go#L493C1-L505C1, and rewrapper has nothing to do with the proto reflection logic neither. We will need to investigate more to understand why rewrapper crashed.
@mostynb By any chance, if you also have Linux, Windows or non-intel Mac, does 0.146.0.0c7ca4be
crash on these machines?
I will see if I can find any other instances of this failure, but it might take a couple of days.
Ok this stack trace seems to come from initialization of rewrapper. Running it under gdb:
(gdb) bt
#0 google.golang.org/protobuf/reflect/protoregistry.(*Files).RegisterFile (r=0xc00029c090, file=..., ~r0=...)
at external/org_golang_google_protobuf/reflect/protoregistry/registry.go:173
#1 0x00000000006855df in google.golang.org/protobuf/internal/filetype.(*resolverByIndex).RegisterFile (.this=0xc0002a4180, .anon0=..., .anon0=...)
at <autogenerated>:1
#2 0x00000000005bf65d in google.golang.org/protobuf/internal/filedesc.Builder.Build (db=..., out=...)
at external/org_golang_google_protobuf/internal/filedesc/build.go:112
#3 0x0000000000681ec5 in google.golang.org/protobuf/internal/filetype.Builder.Build (tb=..., out=...)
at external/org_golang_google_protobuf/internal/filetype/build.go:138
#4 0x0000000000692138 in google.golang.org/protobuf/types/descriptorpb.file_google_protobuf_descriptor_proto_init ()
at external/org_golang_google_protobuf/types/descriptorpb/descriptor.pb.go:4345
#5 0x0000000000691f37 in google.golang.org/protobuf/types/descriptorpb.init.0 ()
at external/org_golang_google_protobuf/types/descriptorpb/descriptor.pb.go:3982
#6 0x0000000000446ae6 in runtime.doInit (t=0xdb5480 <google.golang.org/protobuf/types/descriptorpb.[inittask]>) at GOROOT/src/runtime/proc.go:6525
#7 0x0000000000446a31 in runtime.doInit (t=0xdb8c00 <google.golang.org/protobuf/reflect/protodesc.[inittask]>) at GOROOT/src/runtime/proc.go:6502
#8 0x0000000000446a31 in runtime.doInit (t=0xdbaa40 <github.com/golang/protobuf/proto.[inittask]>) at GOROOT/src/runtime/proc.go:6502
#9 0x0000000000446a31 in runtime.doInit (t=0xdb8020 <google.golang.org/grpc/credentials.[inittask]>) at GOROOT/src/runtime/proc.go:6502
#10 0x0000000000446a31 in runtime.doInit (t=0xdb8520 <google.golang.org/grpc/internal/channelz.[inittask]>) at GOROOT/src/runtime/proc.go:6502
#11 0x0000000000446a31 in runtime.doInit (t=0xdb1700 <google.golang.org/grpc/channelz.[inittask]>) at GOROOT/src/runtime/proc.go:6502
#12 0x0000000000446a31 in runtime.doInit (t=0xdb84a0 <google.golang.org/grpc/balancer.[inittask]>) at GOROOT/src/runtime/proc.go:6502
#13 0x0000000000446a31 in runtime.doInit (t=0xdbfee0 <google.golang.org/grpc.[inittask]>) at GOROOT/src/runtime/proc.go:6502
#14 0x0000000000446a31 in runtime.doInit (t=0xdb6080 <github.com/bazelbuild/reclient/internal/pkg/ipc.[inittask]>) at GOROOT/src/runtime/proc.go:6502
#15 0x0000000000446a31 in runtime.doInit (t=0xdb9e40 <main.[inittask]>) at GOROOT/src/runtime/proc.go:6502
#16 0x00000000004394c6 in runtime.main () at GOROOT/src/runtime/proc.go:233
#17 0x0000000000469021 in runtime.goexit () at src/runtime/asm_amd64.s:1598
the IPC package in rewrapper initializes grpc, which eventually leads to this line that crashed. The only thing in this stack of code that has changed recently is the grpc balancer we use as part of remote-apis-sdks (https://github.com/bazelbuild/remote-apis-sdks/blob/574c71c40d33c8bbbed19b22821b57b3e084b887/go/pkg/balancer/gcp_balancer.go#L8). This maybe causing the RegisterFile() to be called now, but no idea why it would crash though.
Is the source directory a fuse filesystem or running within a sandbox? That could be another avenue for such a corruption.
@mostynb By any chance, if you also have Linux, Windows or non-intel Mac, does 0.146.0.0c7ca4be crash on these machines?
I have only found this one instance of the crash (but I am unable to search many days back).
Is the source directory a fuse filesystem or running within a sandbox?
I don't think so- we use veertu's "anka" mac VMs.
I am assuming this hasn't reproduced since then (and we have also recently updated a LOT of our dependencies). Unsure what the action item for us here would be given we have no repro, so closing the bug.
Feel free to reopen if you find a definitive cause for the failure!
I saw this rewrapper crash in CI on an intel mac, when using reclient 0.146.0.0c7ca4be: