-
### 🐛 Describe the bug
shufflenet_v2_x1_0 QAT performance regression
model_name
qat_new
qat_old
qat ratio(new/old)
shufflenet_v2_x1_…
-
Right now the backing type of `Block` is `[u8; 16]` as it was simple to start with. However, core arrays can not always take advantage of auto-vectorization/SIMD for important operations.
For examp…
-
At this moment NCC has support for vectorization/inlining/parallelism reporting. But due to the format used it's not really helpful.
Please consider this code:
```
template
int64_t omp(T *x, ui…
-
With test code like this:
```
template
__attribute__((noinline)) int testForLoop(Span span) {
int sum = 0;
for (auto it = span.begin() + 1; it < span.end(); ++it) {
sum += *it - *(it …
-
The way query iteration is currently implemented leads to inefficient code. On my machine, `iterate_mut_100k` runs in around 29us. However, running the same iteration through explicit archetypes leads…
-
Hi,
It seems like having multiple MCMC chains while using potential_fn is currently implemented with just running the chains separately. However, it'd seem more desirable to have an option to compu…
pimdh updated
4 years ago
-
Vectorcall is a calling convention for x86 much like fastcall in which vectors are allowed to be passed by the XMM, YMM, or ZMM registers to functions directly. This results in huge speedups, as it al…
-
| | |
|--------------------|----|
| Bugzilla Link | [PR47020](https://bugs.llvm.org/show_bug.cgi?id=47020) |
| Status | NEW |
| Importance | P enhancemen…
-
| | |
| --- | --- |
| Bugzilla Link | [42173](https://llvm.org/bz42173) |
| Version | 8.0 |
| OS | Linux |
| CC | @adibiagio,@gregbedwell,@RKSimon |
## Extended Description
The docs for llvm-mca (…
-
### Describe the enhancement requested
Parquet C++ uses RLE encoding to encode the rep-level and def-levels. Benchmark can be seen here: https://github.com/apache/arrow/pull/39705#issuecomment-1921…