-
I've created a version of the direct sgemm code for AVX2 (it's shared with the AVX512 code with very limited ifdefs, so can compile from the same source).
Question is if and how to integrate.
The …
-
### Your current environment
```
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.3 LTS (x86_64)
GCC versio…
ebi64 updated
1 month ago
-
You can see one approach implemented for 64 bit integers using highway here: https://gcc.godbolt.org/z/YWx3vaTET
This needs the `MulEven`/VPMULUDQ instruction.
The function multiplies two vectors …
-
In the manual implementation of the OpenMP Nemolite code, there exists a comment:
```
! We have to block here since sshn_t is used in the following loop.
! We could avoid this by altering the follo…
-
## The bug
It appears that the `webio-jupyterlab-provider` extension is incompatible with JupyterLab v4.0.8
## Context
When I upgraded to JupyterLab v4.0.8 the `webio-jupyterlab-provider` stopped…
-
### How can we reproduce the crash?
import [zeromq](https://www.npmjs.com/package/zeromq) and create any socket
```ts
import zmq from 'zeromq';
const sock = new zmq.Publisher();
```
### Relevant…
-
This 1-element (scalar) kernel works on CPU, but gives a `Error: CUDA error: CUDA_ERROR_ILLEGAL_ADDRESS cuLaunchKernel failed` on CUDA using both Li2018 and Anderson2021 autoschedulers.
```py
impo…
-
The 4.4 release caused a performance regression on the Serial backend, for the Trilinos Intrepid2 Sierra test.
Bisecting showed that the first commit with the regression was the merge of #7080.
The …
-
### What happened?
I am trying to run Qwen2-57B-A14B-instruct, and I used llama-gguf-split to merge the gguf files from [Qwen/Qwen2-57B-A14B-Instruct-GGUF](https://huggingface.co/Qwen/Qwen2-57B-A14B-…
-
### Describe your issue.
If I using `KDTree().query()` with the argument `workers` and the values `1` or `-1` I got different results. The result, using multiple workers, is wrong. The error only a…