Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Missed 'compress' codegen opportunity #41835

Open Quuxplusone opened 5 years ago

Quuxplusone commented 5 years ago
Bugzilla Link PR42865
Status NEW
Importance P enhancement
Reported by David Bolvansky (david.bolvansky@gmail.com)
Reported on 2019-08-01 14:20:17 -0700
Last modified on 2019-08-01 15:10:40 -0700
Version trunk
Hardware PC Linux
CC craig.topper@gmail.com, hideki.saito@intel.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, spatel+llvm@rotateright.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
int floatcompress(float* __restrict__ in, float* __restrict__ out, int N,
                    float T) {
  int n = 0;
  for (int i = 0; i < N; ++i) {
    if (in[i] > T) out[n++] = in[i];
  }
  return n;
}

int intcompress(int* __restrict__ in, int* __restrict__ out, int N,
                    int T) {
  int n = 0;
  for (int i = 0; i < N; ++i) {
    if (in[i] > T) out[n++] = in[i];
  }
  return n;
}

-Ofast -march=icelake-server

ICC uses 'vcompressps' / 'vpcompressd', Clang's codegen should be improved to
use them too..

Current codegen: https://godbolt.org/z/eS733l
Quuxplusone commented 5 years ago
Compress and expand:
https://techdecoded.intel.io/resources/tuning-for-success-with-the-latest-simd-extensions-and-intel-advanced-vector-extensions-512/
Quuxplusone commented 5 years ago

I think this more likely needs to be handled by the loop vectorizer.