Open jtattermusch opened 4 years ago
Related PR https://github.com/grpc/grpc/pull/20732
CC @veblush @nlopezgi
Hi @jtattermusch , it seems you are pinned to old versions of the ubsan configs here? Can you try to use https://github.com/bazelbuild/bazel-toolchains/tree/master/configs/experimental/ubuntu16_04_clang/1.3/bazel_1.0.0/ubsan and let me know if the problem persists? Note you need to update these manually with new versions of Bazel (its still the only moving part for your whole toolchain setup that requires manual update, afaik).
@nlopezgi I will try to update the bazel config for ubsan (I'm actually testing the config here: https://github.com/grpc/grpc/pull/20732/files#diff-caf6876ad86b4eda47d7152ba8d93ebeR95), but I'm seeing the same problem with ASAN and TSAN (and those are using rbe_autoconfig).
Update: I was never seeing the problem for UBSAN, but rather for ASAN and TSAN.
looping in @ilya-biryukov for advice
The minimal repro:
Dockerfile
:
FROM marketplace.gcr.io/google/rbe-ubuntu16-04
ADD main.cc /root
WORKDIR /root
# removing -fsanitize=address will make the test pass
RUN clang++ -std=c++11 -fsanitize=address main.cc
RUN ./a.out
main.cc
:
int main()
{
std::string teststr = "";
for (int i = 0; i < 10000; i++)
{
teststr += "a";
}
std::regex testregex(".*");
std::smatch res;
std::regex_match(teststr, res, testregex);
}
This will result in a segfault "stack overflow":
SUMMARY: AddressSanitizer: stack-overflow /tmp/clang-build/src/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cpp:22:3 in __asan_memcpy
According to my tests, the stack overflow happens for any string longer than ~2000 characters. I'd be interested in understanding:
This seems like a long-standing bug in libstdc++, there were no changes to the compiler or C++ libraries that should either fix or surface this problem. I can reproduce the problem with your Dockerfile and the hashes for both releases of bazel-toolchains
:
# bazel-toolchains 0.29.5
FROM gcr.io/cloud-marketplace/google/rbe-ubuntu16-04@sha256:bc6a2ad47b24d01a73da315dd288a560037c51a95cc77abb837b26fef1408798
# bazel-toolchains 1.1.0
#FROM gcr.io/cloud-marketplace/google/rbe-ubuntu16-04@sha256:0151b16320e3132a9899022567ce2b1c3d237b20913140410b9037317d333672
However, this might point to an issue in bazel-toolchains
: TSAN builds should not use system libstdc++. Instead, a TSAN-instrumented version of libc++
should be used. @nlopezgi, could it be that we accidentally switched from using the tsan build of libc++
in bazel-toolchains-0.29.5
to using system libstdc++
in bazel-toolchains-1.0
?
This seems like a long-standing bug in libstdc++, there were no changes to the compiler or C++ libraries that should either fix or surface this problem. I can reproduce the problem with your Dockerfile and the hashes for both releases of
bazel-toolchains
:# bazel-toolchains 0.29.5 FROM gcr.io/cloud-marketplace/google/rbe-ubuntu16-04@sha256:bc6a2ad47b24d01a73da315dd288a560037c51a95cc77abb837b26fef1408798 # bazel-toolchains 1.1.0 #FROM gcr.io/cloud-marketplace/google/rbe-ubuntu16-04@sha256:0151b16320e3132a9899022567ce2b1c3d237b20913140410b9037317d333672
However, this might point to an issue in
bazel-toolchains
: TSAN builds should not use system libstdc++. Instead, a TSAN-instrumented version oflibc++
should be used. @nlopezgi, could it be that we accidentally switched from using the tsan build oflibc++
inbazel-toolchains-0.29.5
to using systemlibstdc++
inbazel-toolchains-1.0
?
This is not the first time I'm hearing about the problem of accidentally using libstdc++ instead of libc++ in RBE sanitizers. I believe @veblush has run into that in a different context.
What is our next step to work around this problem? Can a new patched version of the docker image be released? It's blocking our update to bazel 1.0.0.
Related issue: https://github.com/GoogleCloudPlatform/layer-definitions/issues/531 (but doesn't offer any workaround except upgrading to ubuntu1804 image (which doesn't seem to be ready yet)
Note that UBSAN does not require instrumenting the standard library, so updating to Ubuntu 18.04 can fix UBSan failures. However, using TSAN with non-TSAN build of any standard library will inevitably lead to other false-positives.
@nlopezgi should know how to update the docker image and the toolchains. Also happy to help with figuring out the right flags/coming up with a test case checking we're using the proper standard library...
- On that note, what is the easiest way of telling which exact docker image and tag was used for running a particular test when using rbe_autoconfig?
you can run bazel query --output=build @rbe_default//config:platform
which will print out the details of the generated platform, including the docker image.
You can also over ride which image to use by setting digest
directly.
However, this might point to an issue in
bazel-toolchains
: TSAN builds should not use system libstdc++. Instead, a TSAN-instrumented version oflibc++
should be used. @nlopezgi, could it be that we accidentally switched from using the tsan build oflibc++
inbazel-toolchains-0.29.5
to using systemlibstdc++
inbazel-toolchains-1.0
?
We have not made any changes to the TSAN configuration in ages, we barely test this (with abseil), so we really depend on gRPC maintaining this themselves.
What is our next step to work around this problem? Can a new patched version of the docker image be released? It's blocking our update to bazel 1.0.0.
It can as soon as we find what is the actual fix.
Related issue: #531 (but doesn't offer any workaround except upgrading to ubuntu1804 image (which doesn't seem to be ready yet)
The ubuntu1804 image is now ready (and has been for almost a month). afaik, its just waiting for you folks to test it with grpc, but there is nothing else on our side to provide for this.
@nlopezgi should know how to update the docker image and the toolchains. Also happy to help with figuring out the right flags/coming up with a test case checking we're using the proper standard library...
I'm not sure what is wrong here that we would need to update the docker image for. We have never had a container with a TSAN-instrumented version of libc++ and creating one would likely mean we need to publish a new container (the rbe ubuntu container comes with an msan instrumented version of libc++, can we even have both in the same container and have Bazel behave properly?)
I will check to see if I can find at which point in the past the example repro that @jtattermusch sent stopped working and see if we can find what changed from there.
I tried the example repro that @jtattermusch sent with several versions of the rbe-ubuntu 16_04 container, all the way to the first one we published (gcr.io/cloud-marketplace/google/rbe-ubuntu16-04@sha256:59bf0e191a6b5cc1ab62c2224c810681d1326bad5a27b1d36c9f40113e79da7f) it fails with every single one. I'm not sure what I can do to help fix this. @jtattermusch could you try to create a repro that works with an older version of the container?
@nlopezgi could it be the case we've been using system libstdc++
for TSAN builds all along?
We must ensure the TSAN builds use libc++ from the docker image, not the system libstdc++
. That would both fix the current issue and ensure TSAN presubmits do not produce any false-positives on calls into the C++ standard library.
It's quite possible we have had that setup improperly all along. Some Q's:
Would we need to install a different version of libc++ than what we currently have?
Yes, and we would need libcxx-tsan_r####
, e.g. here's the latest build for Ubuntu 16.04.
Currently grpc just sets these linkopts for msan: https://github.com/grpc/grpc/blob/master/WORKSPACE#L41
There should be some other place with additional configuration. I would guess that we also generate a custom crosstool as we need to pass additional arguments for the compilation step, at least -fsanitize=thread
and flags to instruct clang to use libc++ instead of (the default) libstdc++.
Can both versions of libc++ be installed at the same time in a single container?
Yes, it's possible to install two versions of libc++
into the same container. However, we have to pass additional flags to ensure clang's toolchain auto-detection does not pick the wrong version of libc++
.
How will we tell Bazel which one to use?
We have to pass:
-nostdinc++ -I$LIBCXX_PATH/include/c++/v1
-nostdlib++ -L$LIBCXX_PATH/lib -Wl,-rpath=$LIBCXX_PATH/lib -lc++
grpc also passes -lc++abi
and -lm
, I don't think they're required, but not 100% certain.
After upgrading to bazel 1.0.0, I'm hitting a std::regex bug in TSAN and UBSAN configurations (on bazel 0.29.1 there was no problem).
https://source.cloud.google.com/results/invocations/0ea14e18-2fbd-4412-9ba2-6b1ddad3d683/targets
Looks like the problem related to a bug in std::regex? https://stackoverflow.com/questions/36304204/c-regex-segfault-on-long-sequences