Open manisandro opened 2 years ago
I think the generic part of the SIMD implementatiion is quite broken, but is "accidentally" correct for architectures with a SIMD width of at least 128 bit (e.g. SSE) and vector types up to 256 bit (4 x double).
The SIMD implementation models wide types (width N
) by splitting these into a "Lo" part with native width (N1
), and a "Hi" part with the remainder (width N2
), and does it iteratively until the last "Hi" element also fits into the native type. I.e. N == N1 + N2
.
Unfortunately, the Unpack
implementaion gets this wrong for the generic case:
https://github.com/NGSolve/netgen/blob/bdc738f87e8e4191de4552f4931ae31a6f526f41/libsrc/core/simd_generic.hpp#L703-L723
The last branch (≃ N > 2
) Unpack
should return a tuple<SIMD<double, 2*N1>, SIMD<double, 2*N1>, ...>
, the tuple should have N / N1
elements. For the (N == 4
, N1 == N2 == 2
) case, this matches the current implementation by chance.
The xsimd project has put some work into making their header library compatible with unsupported (generic) architectures. Would it be useful to refactor netgen's SIMD to outsource it to xsimd? (they still have some work to do in https://github.com/xtensor-stack/xsimd/issues/954)
I'm attempting to build netgen-6.2.2203 for Fedora [1]. I'm currently hitting a build failure on arches which fall back to
simd_generic.hpp
, specifically:See [1] for full build logs on failing arches.
[1] https://koji.fedoraproject.org/koji/taskinfo?taskID=87307345