Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Clang-overlay of <intrin.h> breaks <intrin0.h> in VS 2019 version 16.8p1 #46068

Open Quuxplusone opened 4 years ago

Quuxplusone commented 4 years ago
Bugzilla Link PR47099
Status NEW
Importance P enhancement
Reported by Billy O'Neal (billy.oneal@gmail.com)
Reported on 2020-08-10 14:32:51 -0700
Last modified on 2021-02-19 12:09:49 -0800
Version 10.0
Hardware PC Windows NT
CC alex_toresh@yahoo.fr, Casey@Carter.net, craig.topper@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, richard-llvm@metafoo.co.uk, rnk@google.com, zufuliu@163.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also

Previously reported by a Visual Studio customer as https://developercommunity.visualstudio.com/content/problem/1144026/visual-studio-version-1680-preview-10-no-longer-co.html

The standard libraries have an <intrin0.h> where we declare intrinsics used by the standard library headers as a throughput optimization, because <intrin.h> is huge and causes measurable throughput costs to #include <atomic>.

As part of implementing C++20, we needed new intrinsics for <bit> so we moved them from <intrin.h> to <intrin0.h>. Unfortunately, that is breaking whatever overlay mechanism Clang on Windows uses to select its version of <intrin.h> because it tries to declare _tzcnt_u32 and _tzcnt_u64 as object-like macros.

We are interested in investigating a scheme whereby our official <intrin.h> would do #ifdef __clang__, then #include <whatever clang wants.hpp> or similar so that we would have a more firmly established contract for clang's extensions rather than needing overlays and #include_next.

Billy ONeal Visual C++ Libraries

Quuxplusone commented 4 years ago

Per the Intel Intrinsics Guide, _tzcnt_u32 is declared by :

https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=tzcnt&expand=5972

Clang provides its own , which declares this intrinsic as a macro. If is providing its own definition of _tzcnt_u32, rather than getting the one from , that's going to cause problems.

Now, Clang also provides a complete replacement for , and doesn't use the one provided by MSVC. Clang's replacement (indirectly) includes , which is where _tzcnt_u32 usually comes from when we target Windows.

It seems like there are a couple of different approaches we could take here:

1) We continue to replace with our own implementation, and extend that to also cover . We'd need to know which intrinsics should be provided by so that we can expose the proper set. No changes on the Visual C++ side.

2) Your detects clang and includes the relevant header from the Intel Intrinsics Guide.

If we want to avoid pulling in all of into your standard library, option 1 seems like the way to go to me. Do you have any documentation for what should provide?

Quuxplusone commented 4 years ago

Alternatively, could you define the functions in terms of Clang _builtin functions under #ifdef __clang__, and not include any intrinsic header at all in that case (and more generally never include when the compiler is Clang)?

Quuxplusone commented 4 years ago

Clang also provides a complete replacement for

Not quite, because it does #include_next which ends up pulling in ours.

not include any intrinsic header at all in that case (and more generally never include when the compiler is Clang)?

That's the first thing I tried; unfortunately that doesn't work because the user can include themselves which clobbers the attempt to use the intrinsic in .

This is one of those places that hasn't been really 'designed' because we never had to worry about a compiler changing this outside of the corresponding library parts.

1) We continue to replace with our own implementation, and extend that to also cover . We'd need to know which intrinsics should be provided by

That set is constantly changing; includes and we have not before considered 'promoting' an intrinsic from one to the other a breaking change, but it has broken bits here.

2) Your detects clang and includes the relevant header from the Intel Intrinsics Guide.

I tried something like this too, making our intrin0.h go "if clang, then #include " to pick up your overridden version. Unfortunately due to the aforementioned include_next that causes circular include hell since that tries to include clang's intrin.h, which include_next's our intrin.h which then again tries to include intrin0.h...

That's why I think getting rid of what's forcing you folks to use include_next is going to be a key part of whatever solution on which we land.

Quuxplusone commented 4 years ago
I believe our header is doing

#ifndef _MSC_VER
#include_next <intrin.h>
#else
// provide the intrinsics
#endif

So if we're not pretending to be MSVC we don't define anything and just pass on
to the next include.

I think _MSC_VER being defined is controlled by -fms-compatibility-version.
Quuxplusone commented 4 years ago

(In reply to Billy O'Neal from comment #3)

Clang also provides a complete replacement for

Not quite, because it does #include_next which ends up pulling in ours.

The #include_next is under an #ifndef _MSC_VER. Clang defines _MSC_VER itself when targeting Windows. I think the intent is that the #include_next is unreachable on Windows, and instead exists only to make our <intrin.h> be as invisible as possible when not targeting Windows.

If we're including your <intrin.h> instead of our own, then perhaps something strange is happening there. (Or perhaps I'm misunderstanding what the #ifndef is doing.) Can you check whether Clang is defining _MSC_VER in your test environment?

not include any intrinsic header at all in that case (and more generally never include when the compiler is Clang)?

That's the first thing I tried; unfortunately that doesn't work because the user can include themselves which clobbers the attempt to use the intrinsic in .

I'm not sure I understand. If (when compiling with Clang) your directly uses Clang's __builtin_popcount and friends, and doesn't include , then I would think that nothing that uses should be clobbered by . What am I missing?

This is one of those places that hasn't been really 'designed' because we never had to worry about a compiler changing this outside of the corresponding library parts.

1) We continue to replace with our own implementation, and extend that to also cover . We'd need to know which intrinsics should be provided by

That set is constantly changing; includes and we have not before considered 'promoting' an intrinsic from one to the other a breaking change, but it has broken bits here.

I'd imagine that most of the time, things in your are consistent with things in our , since (other than the and related stuff) it's mostly just a bunch of extern "C" function declarations that will typically have the exact same signature in your <intrin*.h> and in our . But because in this instance you're (presumably) declaring one of the functions in , and those aren't just function declarations, this is a more risky change.

Quuxplusone commented 4 years ago
>So if we're not pretending to be MSVC we don't define anything and just pass
on to the next include.
>
>I think _MSC_VER being defined is controlled by -fms-compatibility-version.

Hmmmm I see now that clang also overlays the headers intrin.h ends up also
including, and... we're not generally prepared for that. I don't think in
general it is reasonable for the VC Libraries to support this condition where
some of the intrinsics infrastructure is being replaced but other parts are
not. We have no contract that the different headers from our implementation are
only implemented in terms of the publicly documented portions of the other
headers, and intrin0.h is just the first one we noticed because it's the first
one substantially changed since we officially started supporting Clang.

One wouldn't expect our <vector> to work with libc++'s <algorithm>, for
example, and that's the kind of swiss cheese / sponge we've got here.

We can absolutely fix our intrinsics headers to step out of the way as needed
by Clang; please let us know what you would like to do in that case.
Alternately, Clang can just not load our intrinsics headers at all and we will
fix the standard libraries to not implement the intrin0.h optimization there.
Quuxplusone commented 4 years ago

Can you check whether Clang is defining _MSC_VER in your test environment?

It is.

I'm not sure I understand. If (when compiling with Clang) your directly uses Clang's __builtin_popcount and friends, and doesn't include , then I would think that nothing that uses should be clobbered by . What am I missing?

Sorry, what I mean is I tried to make the content of intrin0.h be "if clang, then #include because clang will replace that". But clang is recursively relying on our to provide _InterlockedAdd (for example) when it does include_next, because we declare _InterlockedAdd only in intrin0.h.

But because in this instance you're (presumably) declaring one of the functions in , and those aren't just function declarations, this is a more risky change.

Right in this particular example it came from immintrin but it highlights a contract problem in general here; that we need to make it easier for Clang to provide these bits without 'guessing' at what the contents of our intrin headers will be, and Clang needs to not try to replace the intrinsics package piecemeal.

Quuxplusone commented 4 years ago
OK. So:

We need to implement <immintrin.h> and related headers ourselves. Those are
implemented in terms of private compiler intrinsics which we don't want
anything outside those headers to use, and I don't think we would want to
support whatever mechanism your <immintrin.h> uses to communicate with cl.exe.
This is an abstraction layer defined by an Intel spec, that's intended to be
implemented by the compiler.

However, <intrin.h> is not an Intel thing, it's an MSVC thing. Perhaps we could
remove our <intrin.h> entirely and use yours instead; I'm not sure. There was
presumably a reason why we implemented our own instead of just using the
platform <intrin.h>, but it might be historical at this point.

That said, if I'm understanding the discussion correctly, the new intrinsics
that you want to add to <intrin0.h> are the _tzcnt_* ones. Those are from
<immintrin.h>, which is the compiler's domain not the standard library's. As
such, I don't think it's appropriate for _tzcnt_u32 to be declared anywhere
other than by the compiler's own builtin headers.

So if we want _tzcnt_* to be provided by <intrin0.h>, I think the choice is
either that

1) <intrin.h> is not a compiler builtin header -- and Clang shouldn't be
providing a copy of it -- in which case the only way it can get access to
_tzcnt_u32 would be by including <immintrin.h> (which you quite reasonably
don't want to do because that header is huge), or

2) <intrin.h> is a compiler builtin header, in which case Clang needs to
implement the whole thing, and needs to also implement <intrin0.h> and provide
the contents for it.

Option (2) seems preferable to me. While we can't guess which of the MSVC-
specific functions will get moved from <intrin.h> to <intrin0.h>, we can
conservatively put *all* of them in <intrin0.h>. Given that you want to put
_tzcnt_u32 in <intrin0.h> too, that presumably means also including
<bmiintrin.h> from our <intrin0.h>.

That all seems doable to me.

Would it be possible for you to try https://reviews.llvm.org/D85699 and see if
it fixes the problem for you?
Quuxplusone commented 4 years ago
>We need to implement <immintrin.h> and related headers ourselves.

That makes sense to me.

>However, <intrin.h> is not an Intel thing, it's an MSVC thing. Perhaps we
could remove our <intrin.h> entirely and use yours instead; I'm not sure. There
was presumably a reason why we implemented our own instead of just using the
platform <intrin.h>, but it might be historical at this point.

Whether you use ours or not isn't really an issue, just hopefully want to get
to a place where we aren't 'mixing'.

>That said, if I'm understanding the discussion correctly, the new intrinsics
that you want to add to <intrin0.h> are the _tzcnt_* ones.

Hmmm yes and no. The VS 16.8 STL needs to work with Clang 10, so the cat's
already out of the bag. We will need to workaround the problem by declaring
those intrinsics in intrin0.h only for MSVC and pulling in the full intrin.h
for clang.

What I want is to figure out what protocol/contract we need to prevent similar
cats from escaping in the future. That could be just contributing the contents
of our intrin.h to the LLVM project so that the include_next gets removed, and
the in the future they get maintained independently, with clang adding
declarations as you all implement them.

>Option (2) seems preferable to me. While we can't guess which of the MSVC-
specific functions will get moved from <intrin.h> to <intrin0.h>, we can
conservatively put *all* of them in <intrin0.h>. Given that you want to put
_tzcnt_u32 in <intrin0.h> too, that presumably means also including
<bmiintrin.h> from our <intrin0.h>.

Right, I was going to just completely defer to clang's intrin.h and skip the
intrin0 part completely, but was foiled by that include_next. The include_next
is really the source of the pain here as it means we can't fix the problem in
either our sources or LLVM's, we need to make a coordinated change in both
because the resulting TU is blurred between our release vehicles. Do you want
us to submit a change that effectively makes your intrin.h and our intrin.h
identical for further review? (I'll strip out declarations you already have)
That lets us get rid of the problematic include_next.
Quuxplusone commented 4 years ago

If _MSC_VER is being defined there shouldn’t be an include_next happening in clang’s header.

Quuxplusone commented 4 years ago
(In reply to Craig Topper from comment #10)
> If _MSC_VER is being defined there shouldn’t be an include_next happening in
> clang’s header.

Hmmmmm did I get confused with double negatives? I probably got confused with
double negatives. I just tried this again and maybe it'll work. Will keep you
posted.
Quuxplusone commented 4 years ago

OK, that seemed to work, as long as we have assurance that this include_next (and similar) won't ever engage on Windows just dummying out and making include seeeeeeems to work for now.

It still seems like we need a checklist or something for communicating with you folks when the set of intrinsics changes.

Quuxplusone commented 3 years ago
some _InterlockedCompareExchange128 intrinsic functions are missing from
intrin.h.
e.g. _InterlockedCompareExchange128_nf used by Microsoft STL <atomic> (see
https://github.com/microsoft/STL/blob/master/stl/inc/atomic#L483) is missing.

https://docs.microsoft.com/en-us/cpp/intrinsics/interlockedcompareexchange128?view=msvc-160
https://github.com/microsoft/STL/issues/1491

currently intrin.h only contains _InterlockedCompareExchange128 and
_InterlockedCompareExchange128_np:
https://github.com/llvm/llvm-project/blob/master/clang/lib/Headers/intrin.h#L217
Quuxplusone commented 3 years ago
These days, intrin.h mostly just provides declarations of builtins, and doesn't
provide as many implementations. Being present in the header isn't enough. We
also need an implementation in the compiler, and, surprise surprise, most MSVC
intrinsics that were not necessary have not been implemented.
_InterlockedCompareExchange128_nf is one such unimplemented intrinsic. I'll go
ahead and look into implementing it.

---

Getting back to Billy's original concern, I would like to try to improve on the
header shadowing situation. Can we set up a time to talk about this?

This is the list of headers that clang shadows:

$ ls /c/Program\ Files\ \(x86\)/Microsoft\ Visual\
Studio/2019/Professional/VC/Tools/MSVC/14.28.29333/include/ > vc-headers.txt

$ ls ../clang/lib/Headers/ > clang-headers.txt

$ comm -12 vc-headers.txt clang-headers.txt
ammintrin.h
arm64intr.h
armintr.h
emmintrin.h
immintrin.h
intrin.h
iso646.h
limits.h
mm3dnow.h
mmintrin.h
nmmintrin.h
pmmintrin.h
smmintrin.h
stdarg.h
stdbool.h
stdint.h
tmmintrin.h
vadefs.h
varargs.h
wmmintrin.h
xmmintrin.h

The vast majority are CPU vendor intrinsic headers, ARM and X86. Those are
obviously tightly coupled with the compiler, so clang needs to prefer its own
over MSVC's.

stdarg.h is normal, it is the compiler's implementation of the standard
va_start/va_arg macros. vadefs.h is unnecessary: we only need it because the
CRT insists on using __crt_va_start instead of the standard macros. If
Microsoft could change vadefs.h to use __builtin_va_* if __clang__ is defined,
we could remove vadefs.h.

stdbool.h, stdint.h, and limits.h handshake with compiler predefined macros.

That leaves intrin.h and intrin0.h. I would also like clang to benefit from the
intrin0.h optimization. I don't want to include every x86 intrinsic when using
the MSVC STL with clang. We could re-structure our headers so that all the MSVC-
specific builtins are declared in intrin0.h and have intrin.h include
x86intrin.h and intrin0.h. Would that work? What would you like to see here?

In summary, it seems like there is nothing for clang to do, except perhaps
around vadefs.h and intrin[0].h. Email me if you want to set up a time to
coordinate on this.
Quuxplusone commented 3 years ago
I have some patches that hopefully fix the case of the MSVC STL with arm64:
https://reviews.llvm.org/D92061
https://reviews.llvm.org/D92062

The first patch fixes a separate bug.
Quuxplusone commented 3 years ago
Just for the record, it looks like there's a workaround in:
https://github.com/microsoft/STL/issues/1300#issuecomment-718065833
and apparently the fix should land in MSVC 16.9