GoogleCloudPlatform / layer-definitions

This repository contains Bazel targets for Google-maintained common container tool layer definitions and their dependencies.
Apache License 2.0
26 stars 16 forks source link

std::regex issue with ubuntu16_04_clang TSAN (on bazel 1.0.0) #591

Open jtattermusch opened 4 years ago

jtattermusch commented 4 years ago

After upgrading to bazel 1.0.0, I'm hitting a std::regex bug in TSAN and UBSAN configurations (on bazel 0.29.1 there was no problem).

https://source.cloud.google.com/results/invocations/0ea14e18-2fbd-4412-9ba2-6b1ddad3d683/targets

Looks like the problem related to a bug in std::regex? https://stackoverflow.com/questions/36304204/c-regex-segfault-on-long-sequences

test/core/client_channel/xds_bootstrap_test(_ZNSt8__detail9_ExecutorIN9__gnu_cxx17__normal_iteratorIPKcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEESaINS5_9sub_matchISB_EEENS5_12regex_traitsIcEELb1EE6_M_dfsENSH_11_Match_modeEl+0x1a8)[0x521618]
test/core/client_channel/xds_bootstrap_test(_ZNSt8__detail9_ExecutorIN9__gnu_cxx17__normal_iteratorIPKcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEESaINS5_9sub_matchISB_EEENS5_12regex_traitsIcEELb1EE6_M_dfsENSH_11_Match_modeEl+0xe1c)[0x52228c]
test/core/client_channel/xds_bootstrap_test(_ZNSt8__detail9_ExecutorIN9__gnu_cxx17__normal_iteratorIPKcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEESaINS5_9sub_matchISB_EEENS5_12regex_traitsIcEELb1EE16_M_rep_once_moreENSH_11_Match_modeEl+0x338)[0x5237f8]
test/core/client_channel/xds_bootstrap_test(_ZNSt8__detail9_ExecutorIN9__gnu_cxx17__normal_iteratorIPKcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEESaINS5_9sub_matchISB_EEENS5_12regex_traitsIcEELb1EE6_M_dfsENSH_11_Match_modeEl+0x4f2)[0x521962]
test/core/client_channel/xds_bootstrap_test(_ZNSt8__detail9_ExecutorIN9__gnu_cxx17__normal_iteratorIPKcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEESaINS5_9sub_matchISB_EEENS5_12regex_traitsIcEELb1EE6_M_dfsENSH_11_Match_modeEl+0xe1c)[0x52228c]
test/core/client_channel/xds_bootstrap_test(_ZNSt8__detail9_ExecutorIN9__gnu_cxx17__normal_iteratorIPKcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEESaINS5_9sub_matchISB_EEENS5_12regex_traitsIcEELb1EE16_M_rep_once_moreENSH_11_Match_modeEl+0x338)[0x5237f8]
test/core/client_channel/xds_bootstrap_test(_ZNSt8__detail9_ExecutorIN9__gnu_cxx17__normal_iteratorIPKcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEESaINS5_9sub_matchISB_EEENS5_12regex_traitsIcEELb1EE6_M_dfsENSH_11_Match_modeEl+0x4f2)[0x521962]
test/core/client_channel/xds_bootstrap_test(_ZNSt8__detail9_ExecutorIN9__gnu_cxx17__normal_iteratorIPKcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEESaINS5_9sub_matchISB_EEENS5_12regex_traitsIcEELb1EE6_M_dfsENSH_11_Match_modeEl+0xe1c)[0x52228c]
test/core/client_channel/xds_bootstrap_test(_ZNSt8__detail9_ExecutorIN9__gnu_cxx17__normal_iteratorIPKcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEESaINS5_9sub_matchISB_EEENS5_12regex_traitsIcEELb1EE16_M_rep_once_moreENSH_11_Match_modeEl+0x338)[0x5237f8]
test/core/client_channel/xds_bootstrap_test(_ZNSt8__detail9_ExecutorIN9__gnu_cxx17__normal_iteratorIPKcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEESaINS5_9sub_matchISB_EEENS5_12regex_traitsIcEELb1EE6_M_dfsENSH_11_Match_modeEl+0x4f2)[0x521962]
jtattermusch commented 4 years ago

Related PR https://github.com/grpc/grpc/pull/20732

jtattermusch commented 4 years ago

CC @veblush @nlopezgi

nlopezgi commented 4 years ago

Hi @jtattermusch , it seems you are pinned to old versions of the ubsan configs here? Can you try to use https://github.com/bazelbuild/bazel-toolchains/tree/master/configs/experimental/ubuntu16_04_clang/1.3/bazel_1.0.0/ubsan and let me know if the problem persists? Note you need to update these manually with new versions of Bazel (its still the only moving part for your whole toolchain setup that requires manual update, afaik).

jtattermusch commented 4 years ago

@nlopezgi I will try to update the bazel config for ubsan (I'm actually testing the config here: https://github.com/grpc/grpc/pull/20732/files#diff-caf6876ad86b4eda47d7152ba8d93ebeR95), but I'm seeing the same problem with ASAN and TSAN (and those are using rbe_autoconfig).

Update: I was never seeing the problem for UBSAN, but rather for ASAN and TSAN.

nlopezgi commented 4 years ago

looping in @ilya-biryukov for advice

jtattermusch commented 4 years ago

The minimal repro:

Dockerfile:

FROM marketplace.gcr.io/google/rbe-ubuntu16-04

ADD main.cc /root

WORKDIR /root

# removing -fsanitize=address will make the test pass
RUN clang++ -std=c++11 -fsanitize=address main.cc

RUN ./a.out

main.cc:

int main()
{
    std::string teststr = "";
    for (int i = 0; i < 10000; i++)
    {
        teststr += "a";
    }

    std::regex testregex(".*");

    std::smatch res;
    std::regex_match(teststr, res, testregex);
}

This will result in a segfault "stack overflow": SUMMARY: AddressSanitizer: stack-overflow /tmp/clang-build/src/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cpp:22:3 in __asan_memcpy

jtattermusch commented 4 years ago

According to my tests, the stack overflow happens for any string longer than ~2000 characters. I'd be interested in understanding:

ilya-biryukov commented 4 years ago

This seems like a long-standing bug in libstdc++, there were no changes to the compiler or C++ libraries that should either fix or surface this problem. I can reproduce the problem with your Dockerfile and the hashes for both releases of bazel-toolchains:

# bazel-toolchains 0.29.5
FROM gcr.io/cloud-marketplace/google/rbe-ubuntu16-04@sha256:bc6a2ad47b24d01a73da315dd288a560037c51a95cc77abb837b26fef1408798

# bazel-toolchains 1.1.0
#FROM gcr.io/cloud-marketplace/google/rbe-ubuntu16-04@sha256:0151b16320e3132a9899022567ce2b1c3d237b20913140410b9037317d333672

However, this might point to an issue in bazel-toolchains: TSAN builds should not use system libstdc++. Instead, a TSAN-instrumented version of libc++ should be used. @nlopezgi, could it be that we accidentally switched from using the tsan build of libc++ in bazel-toolchains-0.29.5 to using system libstdc++ in bazel-toolchains-1.0?

jtattermusch commented 4 years ago

This seems like a long-standing bug in libstdc++, there were no changes to the compiler or C++ libraries that should either fix or surface this problem. I can reproduce the problem with your Dockerfile and the hashes for both releases of bazel-toolchains:

# bazel-toolchains 0.29.5
FROM gcr.io/cloud-marketplace/google/rbe-ubuntu16-04@sha256:bc6a2ad47b24d01a73da315dd288a560037c51a95cc77abb837b26fef1408798

# bazel-toolchains 1.1.0
#FROM gcr.io/cloud-marketplace/google/rbe-ubuntu16-04@sha256:0151b16320e3132a9899022567ce2b1c3d237b20913140410b9037317d333672

However, this might point to an issue in bazel-toolchains: TSAN builds should not use system libstdc++. Instead, a TSAN-instrumented version of libc++ should be used. @nlopezgi, could it be that we accidentally switched from using the tsan build of libc++ in bazel-toolchains-0.29.5 to using system libstdc++ in bazel-toolchains-1.0?

This is not the first time I'm hearing about the problem of accidentally using libstdc++ instead of libc++ in RBE sanitizers. I believe @veblush has run into that in a different context.

What is our next step to work around this problem? Can a new patched version of the docker image be released? It's blocking our update to bazel 1.0.0.

jtattermusch commented 4 years ago

Related issue: https://github.com/GoogleCloudPlatform/layer-definitions/issues/531 (but doesn't offer any workaround except upgrading to ubuntu1804 image (which doesn't seem to be ready yet)

ilya-biryukov commented 4 years ago

Note that UBSAN does not require instrumenting the standard library, so updating to Ubuntu 18.04 can fix UBSan failures. However, using TSAN with non-TSAN build of any standard library will inevitably lead to other false-positives.

@nlopezgi should know how to update the docker image and the toolchains. Also happy to help with figuring out the right flags/coming up with a test case checking we're using the proper standard library...

nlopezgi commented 4 years ago
  • On that note, what is the easiest way of telling which exact docker image and tag was used for running a particular test when using rbe_autoconfig?

you can run bazel query --output=build @rbe_default//config:platform which will print out the details of the generated platform, including the docker image. You can also over ride which image to use by setting digest directly.

However, this might point to an issue in bazel-toolchains: TSAN builds should not use system libstdc++. Instead, a TSAN-instrumented version of libc++ should be used. @nlopezgi, could it be that we accidentally switched from using the tsan build of libc++ in bazel-toolchains-0.29.5 to using system libstdc++ in bazel-toolchains-1.0?

We have not made any changes to the TSAN configuration in ages, we barely test this (with abseil), so we really depend on gRPC maintaining this themselves.

What is our next step to work around this problem? Can a new patched version of the docker image be released? It's blocking our update to bazel 1.0.0.

It can as soon as we find what is the actual fix.

Related issue: #531 (but doesn't offer any workaround except upgrading to ubuntu1804 image (which doesn't seem to be ready yet)

The ubuntu1804 image is now ready (and has been for almost a month). afaik, its just waiting for you folks to test it with grpc, but there is nothing else on our side to provide for this.

@nlopezgi should know how to update the docker image and the toolchains. Also happy to help with figuring out the right flags/coming up with a test case checking we're using the proper standard library...

I'm not sure what is wrong here that we would need to update the docker image for. We have never had a container with a TSAN-instrumented version of libc++ and creating one would likely mean we need to publish a new container (the rbe ubuntu container comes with an msan instrumented version of libc++, can we even have both in the same container and have Bazel behave properly?)

I will check to see if I can find at which point in the past the example repro that @jtattermusch sent stopped working and see if we can find what changed from there.

nlopezgi commented 4 years ago

I tried the example repro that @jtattermusch sent with several versions of the rbe-ubuntu 16_04 container, all the way to the first one we published (gcr.io/cloud-marketplace/google/rbe-ubuntu16-04@sha256:59bf0e191a6b5cc1ab62c2224c810681d1326bad5a27b1d36c9f40113e79da7f) it fails with every single one. I'm not sure what I can do to help fix this. @jtattermusch could you try to create a repro that works with an older version of the container?

ilya-biryukov commented 4 years ago

@nlopezgi could it be the case we've been using system libstdc++ for TSAN builds all along? We must ensure the TSAN builds use libc++ from the docker image, not the system libstdc++. That would both fix the current issue and ensure TSAN presubmits do not produce any false-positives on calls into the C++ standard library.

nlopezgi commented 4 years ago

It's quite possible we have had that setup improperly all along. Some Q's:

ilya-biryukov commented 4 years ago

Would we need to install a different version of libc++ than what we currently have?

Yes, and we would need libcxx-tsan_r####, e.g. here's the latest build for Ubuntu 16.04.

Currently grpc just sets these linkopts for msan: https://github.com/grpc/grpc/blob/master/WORKSPACE#L41

There should be some other place with additional configuration. I would guess that we also generate a custom crosstool as we need to pass additional arguments for the compilation step, at least -fsanitize=thread and flags to instruct clang to use libc++ instead of (the default) libstdc++.

Can both versions of libc++ be installed at the same time in a single container?

Yes, it's possible to install two versions of libc++ into the same container. However, we have to pass additional flags to ensure clang's toolchain auto-detection does not pick the wrong version of libc++.

How will we tell Bazel which one to use?

We have to pass: