VcDevel / std-simd

std::experimental::simd for GCC [ISO/IEC TS 19570:2018]
Other
574 stars 37 forks source link

status of AVX-512 and Vc #22

Open maierbn opened 3 years ago

maierbn commented 3 years ago

Hi, I have been using Vc with our biomechanics code where it showed to be very beneficial, around 30% peak performance even in the non-academic scenarios :+1: Now it would be nice to be able to use AVX-512 on an Intel Cascade Lake processor (using GCC 10.2, Ubuntu 20.04). I'm not really up to date with the status of Vc and std-simd so I thought I'd just ask here about it:

  1. How is the status of Vc in terms of AVX-512? v1.4 doesn't support it. I checked out the branch mkretz/datapar and could run simple AVX-512 arithmetic, however things like Vc::abs, Vc::exp, Vc::iif seem to not exist. I tried a merge of the branch mkretz/datapar into 1.4, which gave numerous conflicts. At lot of them were files that were renamed differently, but some conflicts were also in the code. I am completely lost here and not able to do this merge.
  2. The alternative would be to switch to std-simd. How stable and feature-complete is this code base already? My use case would be an inclusion of the headers instead of requiring our users to patch and rebuild their GCC compilers. I would probably start with defining a compatibility layer such as
    namespace Vc
    {
    template<typename T,int n>
    using array = std::experimental::fixed_size_simd<T,n>;
    using double_v = std::experimental::native_simd<double>;
    }

    The symbols of Vc used in our code are: Vc::double_v, Vc::int_v and *::size(), Vc::Zero, Vc::One, Vc::Allocator, Vc::abs, Vc::log, Vc::exp, Vc::isnegative, Vc::isfinite, Vc::all_of, Vc::any_of, Vc::where.

I really appreciate your efforts for getting this into the C++ standard in the long term. Maybe it would be nice to have a short-term solution building on Vc, too? (At least a subset for Intel Cascade Lake.) Would it be possible for sb. to implement a Vc 1.5 with AVX-512? Imho the big advantage of Vc from a user's perspective is it's good documentation. Or is std-simd already on the same level as Vc feature-wise and only the documentation is not yet there? Then, could sb. write me the mentioned compatibility layer or guide me in the right direction?

mattkretz commented 3 years ago

Hi. First of all, I'm happy to hear that you had good success with Vc. And I understand that you want to make use of the portability promise now and sadly it isn't there. :disappointed: The reason is that it became too expensive to maintain more SIMD variants without C++17 (constexpr-if is immensely helpful) and without making much more use of GCC vector builtins, which are not available on MSVC. Once I took that step, the implementation had to be very different. Which is why there's no way to merge the std-simd code back to Vc 1.4. std-simd is a complete implementation (modulo bugs) of the Parallelism TS 2 simd specification. It's a different feature set from Vc 1.4. But it certainly misses some of the high level APIs. The API is stable (it's an ISO specification...) but the ABI not necessarily. For most use cases this means it is stable. Of course you can install std-simd at a different location than the libstdc++ path. The latter is just to make it work out-of-the-box as would be expected from a TS implementation. abs and exp should work without namespace qualification for std-simd. Though exp is not vectorized in std-simd yet. iif is not part of the TS because I hope to use operator?: instead. Writing your own iif is hard if you require full generality, but making it work for your own codebase is simple. Just implement it using where. The replacements for Vc::Zero and Vc::One are simply 0 and 1. Compilers are smart enough now to optimize it properly. Vc::Allocator is almost unnecessary with C++17 since new doesn't ignore overalignment anymore. Documentation is an issue, yes. Did you find https://en.cppreference.com/w/cpp/experimental/simd ? Feel free to bug me for filling in missing documentation.

maierbn commented 3 years ago

Thank you for the explanations. Now I know what I will use in my future projects :smiley: The reference "skeleton" that you linked was helpful. I approached my existing code base with a quick-and dirty wrapper such that I can now switch between Vc and std-simd depending on the compiler. At first, std-simd was slower than Vc despite the AVX-512, but then I noticed that we have a lot of exp and log calls. After numerically approximating them accurately enough using arithmetic (Taylor), it's now nice and fast :rocket:

mattkretz commented 3 years ago

Right, exp and log are next on my list for vectorization in std-simd. Note however that I'll have to implement them with high precision in the complete input range. So there's still be room for performance improvement with your own implementation (reduced input range and reduced output precision are the most relevant parameters).

mattkretz commented 3 years ago

I just pushed hmin and hmax. I have an exp implementation waiting in another dev branch; I still need to benchmark it. It's precise with a max error of 1ULP.

maierbn commented 3 years ago

Thank you for implementing hmin / hmax, I just pulled it into our codebase.