-
**Describe the Issue**
Mistral/Nvidia recently released [Nemo 12B](https://mistral.ai/news/mistral-nemo/) and llama.cpp have [added support](https://github.com/ggerganov/llama.cpp/pull/8579) for its …
-
Migrate the Caffe2/MKL-DNN int8 operation to support Aten/JIT backend and align with Qint8 direction in Pytorch/Aten
Motivation
With Cascadelake/VNNI, MKL-DNN int8 functions can speedup DL m…
-
### 🐛 Describe the bug
When trying to `torch.compile` a module that contains `torch.clear_autocast_cache` we get the attached error. I believe this is expected but wondering if there is an establishe…
-
When running newer versions (from 3.3.0 higher) with any model, the JVM crashes.:
Extracted 'ggml.dll' to 'C:\Users\user\AppData\Local\Temp\ggml.dll'
Extracted 'llama.dll' to 'C:\Users\user\AppDat…
-
# Ask a Question
Since the GPU machines of CI have been upgraded from NV6 to T4, it looks quantized model on GPU should be added too.
`Hardware support is required to achieve better performance with…
-
### 🐛 Describe the bug
Reproducing step:
1. enable `test/inductor/test_torchinductor_opinfo.py` with this PR:
https://github.com/pytorch/pytorch/pull/134556
2. `python test/inductor/test_torchin…
-
I’ve discovered a performance gap between the Neural Speed Matmul operator and the Llama.cpp operator in the Neural-Speed repository. This issue was identified while running a benchmark with the ONNXR…
-
Add following:
```
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
```
To reduce some of these warnings, according [stackoverflow](https://stackoverflow.com/questions/66092421/how-to-rebui…
-
### Describe the bug
To finetune model on Xeon CPU, we are following the [ai-reference-models/models_v2/pytorch/llama/training/cpu at main · intel/ai-reference-models (github.com)](https://github.com…
-
### Question
Hello, I have two questions:
**1. I used the same jsonl results, but the scores evaluated were different, the results are shown below.**
`
2023-05-29 09:07:24.546966: I tensorflow/c…