auto-vectorization Search Results

1000+ results
for auto-vectorization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/pytorch #124913

[inductor][cpu]shufflenet_v2_x1_0 QAT performance regression…

### 🐛 Describe the bug shufflenet_v2_x1_0 QAT performance regression model_name qat_new qat_old qat ratio(new/old) shufflenet_v2_x1_…

zxd1997066 updated 6 months ago
2
privacy-scaling-explorations/mpz #84

Improve `Block` with SIMD

Right now the backing type of `Block` is `[u8; 16]` as it was simple to start with. However, core arrays can not always take advantage of auto-vectorization/SIMD for important operations. For examp…

sinui0 updated 1 year ago
2
KonduitAI/deeplearning4j #316

NEC vectorizer output issue

At this moment NCC has support for vectorization/inlining/parallelism reporting. But due to the format used it's not really helpful. Please consider this code: ``` template int64_t omp(T *x, ui…

raver119 updated 4 years ago
1
llvm/llvm-project #108600

[libc++] The representation of bounded iterators inhibits Cl…

With test code like this: ``` template __attribute__((noinline)) int testForLoop(Span span) { int sum = 0; for (auto it = span.begin() + 1; it < span.end(); ++it) { sum += *it - *(it …

ldionne updated 2 months ago
4
Ralith/hecs #351

Less than ideal codegen for iteration

The way query iteration is currently implemented leads to inefficient code. On my machine, `iterate_mut_100k` runs in around 29us. However, running the same iteration through explicit archetypes leads…

dragostis updated 1 year ago
2
pyro-ppl/pyro #2539

MCMC chains with batch dimension when using potential_fn [fe…

Hi, It seems like having multiple MCMC chains while using potential_fn is currently implemented with just running the chains separately. However, it'd seem more desirable to have an option to compu…

pimdh updated 4 years ago
1
thebennybox-Community/Community-Compiler #33

Vectorcall

Vectorcall is a calling convention for x86 much like fastcall in which vectors are allowed to be passed by the XMM, YMM, or ZMM registers to functions directly. This results in huge speedups, as it al…

davidgarland updated 6 years ago
2
Quuxplusone/LLVMBugzillaTest #45989

Reduction loop repeated due to loop invariant variable

| | | |--------------------|----| | Bugzilla Link | [PR47020](https://bugs.llvm.org/show_bug.cgi?id=47020) | | Status | NEW | | Importance | P enhancemen…

Quuxplusone updated 4 years ago
2
llvm/llvm-project #41518

Support source-level region markers

| | | | --- | --- | | Bugzilla Link | [42173](https://llvm.org/bz42173) | | Version | 8.0 | | OS | Linux | | CC | @adibiagio,@gregbedwell,@RKSimon | ## Extended Description The docs for llvm-mca (…

fe1f8067-743b-4361-8fe5-f444cfe5d0be updated 2 years ago
5
apache/arrow #40845

[C++][Parquet] Investigate optimizing level decoding

### Describe the enhancement requested Parquet C++ uses RLE encoding to encode the rep-level and def-levels. Benchmark can be seen here: https://github.com/apache/arrow/pull/39705#issuecomment-1921…

mapleFU updated 7 months ago
7

上一页 1...7 8 9 10 11 12 13...100 下一页

1000+ results for auto-vectorization

1000+ results
for auto-vectorization