Missed 'compress' codegen opportunity

Quuxplusone commented 5 years ago


Bugzilla Link	PR42865
Status	NEW
Importance	P enhancement
Reported by	David Bolvansky (david.bolvansky@gmail.com)
Reported on	2019-08-01 14:20:17 -0700
Last modified on	2019-08-01 15:10:40 -0700
Version	trunk
Hardware	PC Linux
CC	craig.topper@gmail.com, hideki.saito@intel.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, spatel+llvm@rotateright.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also

int floatcompress(float* __restrict__ in, float* __restrict__ out, int N,
                    float T) {
  int n = 0;
  for (int i = 0; i < N; ++i) {
    if (in[i] > T) out[n++] = in[i];
  }
  return n;
}

int intcompress(int* __restrict__ in, int* __restrict__ out, int N,
                    int T) {
  int n = 0;
  for (int i = 0; i < N; ++i) {
    if (in[i] > T) out[n++] = in[i];
  }
  return n;
}

-Ofast -march=icelake-server

ICC uses 'vcompressps' / 'vpcompressd', Clang's codegen should be improved to
use them too..

Current codegen: https://godbolt.org/z/eS733l

Quuxplusone commented 5 years ago

Compress and expand:
https://techdecoded.intel.io/resources/tuning-for-success-with-the-latest-simd-extensions-and-intel-advanced-vector-extensions-512/

Quuxplusone commented 5 years ago

I think this more likely needs to be handled by the loop vectorizer.

Quuxplusone / LLVMBugzillaTest

Missed 'compress' codegen opportunity #41835