avx512 Search Results - Githubissues

1000+ results
for avx512

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

OpenMathLib/OpenBLAS #1937

direct sgemm for AVX2

I've created a version of the direct sgemm code for AVX2 (it's shared with the AVX512 code with very limited ifdefs, so can compile from the same source). Question is if and how to integrate. The …

fenrus75 updated 4 weeks ago
6
vllm-project/vllm #6141

[Usage]: Internal server error when serving LoRA adapters wi…

### Your current environment ``` PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.3 LTS (x86_64) GCC versio…

ebi64 updated 1 month ago
1
jfalcou/eve #1510

[FEATURE] Widening int mul (VPMULUDQ) and overflow checked i…

You can see one approach implemented for 64 bit integers using highway here: https://gcc.godbolt.org/z/YWx3vaTET This needs the `MulEven`/VPMULUDQ instruction. The function multiplies two vectors …

chriselrod updated 1 year ago
9
stfc/PSycloneBench #42

NemoLite2D Fortran OpenMP optimisation

In the manual implementation of the OpenMP Nemolite code, there exists a comment: ``` ! We have to block here since sshn_t is used in the following loop. ! We could avoid this by altering the follo…

LonelyCat124 updated 4 years ago
8
JuliaGizmos/WebIO.jl #514

webio-jupyterlab-provider broken with JupyterLab v4

## The bug It appears that the `webio-jupyterlab-provider` extension is incompatible with JupyterLab v4.0.8 ## Context When I upgraded to JupyterLab v4.0.8 the `webio-jupyterlab-provider` stopped…

baumgold updated 2 months ago
8
oven-sh/bun #12746

Segmentation fault (Using zeromq)

### How can we reproduce the crash? import [zeromq](https://www.npmjs.com/package/zeromq) and create any socket ```ts import zmq from 'zeromq'; const sock = new zmq.Publisher(); ``` ### Relevant…

Araxeus updated 2 months ago
1
halide/Halide #8318

CUDA error: CUDA_ERROR_ILLEGAL_ADDRESS cuLaunchKernel failed

This 1-element (scalar) kernel works on CPU, but gives a `Error: CUDA error: CUDA_ERROR_ILLEGAL_ADDRESS cuLaunchKernel failed` on CUDA using both Li2018 and Anderson2021 autoschedulers. ```py impo…

jansel updated 3 months ago
14
kokkos/kokkos #7268

Serial backend: performance regression

The 4.4 release caused a performance regression on the Serial backend, for the Trilinos Intrepid2 Sierra test. Bisecting showed that the first commit with the regression was the merge of #7080. The …

brian-kelley updated 3 days ago
28
ggerganov/llama.cpp #9628

Bug: Failed to run qwen2-57b-a14b-instruct-fp16.

### What happened? I am trying to run Qwen2-57B-A14B-instruct, and I used llama-gguf-split to merge the gguf files from [Qwen/Qwen2-57B-A14B-Instruct-GGUF](https://huggingface.co/Qwen/Qwen2-57B-A14B-…

tang-t21 updated 3 weeks ago
3
scipy/scipy #20150

BUG: incorrect nearest neighbor search using workers=-1

### Describe your issue. If I using `KDTree().query()` with the argument `workers` and the values `1` or `-1` I got different results. The result, using multiple workers, is wrong. The error only a…

domist07 updated 5 months ago
9

上一页 1...76 77 78 79 80 81 82...100 下一页

1000+ results for avx512

1000+ results
for avx512