avx512 Search Results - Githubissues

1000+ results
for avx512

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

intel/torch-xpu-ops #1071

AssertionError: "Simulate error" does not match "grad can be…

### 🐛 Describe the bug Sometimes，There is an error 'AssertionError: "Simulate error" does not match "grad can be implicitly created only for scalar outputs"' in case: test_autograd_xpu.py::TestAutogr…

PenghuiCheng updated 3 days ago
2
intel/xFasterTransformer #480

Qwen2.5-0.5B-Instruct quantization with gptq error

xft version：1.8.2 lscpu： Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 52 bits physical, 48 bits virtual Byte Order: Little End…

wcollin updated 1 month ago
1
pytorch/pytorch #139298

CUDNN sdp attention causes loss explosion

### 🐛 Describe the bug We observed a NaN regression with 2.5.0, and traced it to CUDNN attention. 2.5.0: ![Screenshot_20241030_095725](https://github.com/user-attachments/assets/5e83ecc4-8f0c-46b…

ad8e updated 4 days ago
22
pytorch/pytorch #140514

Error with fused AdamW

### 🐛 Describe the bug ``` File "/mnt/clusterstorage/workspace/kevin/ml-monorepo/chadfusion/train_fsdp.py", line 363, in fsdp_train scaler.step(opt) File "/usr/local/lib/python3.10/dist-…

ad8e updated 2 weeks ago
5
post-quantum-cryptography/CECPQ2b #8

add AVX512 implementation

kriskwiatkowski updated 5 years ago
11
pytorch/pytorch #138800

Batching rule not defined for `aten::_make_dual`.

### 🐛 Describe the bug I am trying to call `torch.vmap` on `torch.jacfwd`. This works fine normally but raises the following error when called under `torch.inference_mode()`. ``` File [...]/torch…

keunhong updated 1 month ago
1
meta-llama/llama-stack #385

Could not find conda environment: llamastack-local when runn…

### System Info ``` ~/work/llama-stack/distributions/meta-reference-gpu (main)]$ python -m "torch.utils.collect_env" /home/kaiwu/.conda/envs/llamastack-meta-reference-gpu/lib/python3.10/runpy.py:12…

wukaixingxp updated 3 weeks ago
1
intel/intel-extension-for-pytorch #701

Run finetune.py on Xeon but failed - no attribute "weight"

### Describe the bug To finetune model on Xeon CPU, we are following the [ai-reference-models/models_v2/pytorch/llama/training/cpu at main · intel/ai-reference-models (github.com)](https://github.com…

JamieVC updated 1 week ago
12
mratsim/constantine #427

SIMD Vectorization - Use Integer Fused Multiply-Add (AVX512)

It might be quite interesting to explore SIMD vectorization for elliptic curves and MSMs. This might significantly speed-up: - Verkle Trees - KZG - MSM without needing a GPU. Ideally the same op…

mratsim updated 3 months ago
2
pytorch/pytorch #129358

Pytorch build fail with GCC 14.1.0 due to third_party/fbgemm…

### 🐛 Describe the bug Based on pytorch main branch commit https://github.com/pytorch/pytorch/commit/acfe237a71af609e837a34bb38048aa8acb8eb4d GCC 13.2.0: build pass GCC 14.1.0: build fail …

WangYutao1995 updated 1 week ago
4

上一页 1...9 10 11 12 13 14 15...100 下一页

1000+ results for avx512

1000+ results
for avx512