-
Currently Flash attention is available in CUDA and Metal backends in #5021.
From the paper: Flash attention is an IO-aware exact attention algorithm that uses tiling to reduce the number of memory…
-
SYCL version shows warnings/errors in MKL when built with `RelWithDebInfo` configuration. The error does not occur when built with `Release` configuration.
## Reproduction steps
1. Follow steps ex…
-
# Prerequisites
I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the [README.md](https://github.com/abetlen/llama-cpp-python/b…
DDXDB updated
1 month ago
-
The build can be completed, but after it is finished
Run build\bin\main.exe
```
main: build = 0 (unknown)
main: built with MSVC 19.39.33519.0 for
main: seed = 1708170078
llama_model_load: error…
DDXDB updated
14 hours ago
-
Using example script
```./run-llama2.sh
:: initializing oneAPI environment ...
run-llama2.sh: BASH_VERSION = 5.2.26(1)-release
args: Using "$@" for setvars.sh arguments:
:: advisor -…
-
### Describe the bug
See: https://github.com/intel/llvm/actions/runs/8842488516/job/24281474191
- test_accessor - caused by a36e9f8969a5ad4346f84c925aa89e1a00128b7f. Probably the test uses the dep…
-
Are you planning to support sycl as well for core ultra CPUs - leveraging the igpu along with the cpu . This would make local runs faster for long context length RAG applications.
-
i am trying to compile io-pipe example (io_streaming_one_pipe). I am using OFS 2024.1-1 with example_afu tag ofs-2024.1-1.
i am getting deprecated function warnings in LoopbackTest.hpp
https://g…
-
# Summary
I tried to run matmul primitive refer to [official-example](https://oneapi-src.github.io/oneDNN/page_matmul_example_cpp.html#doxid-matmul-example-cpp)
It occured the bug.
```
derived_t…
-
# Summary
Missing functions rng::device::generate_single and rng::host::generate_single.
I can see this functions in code, but I cannot see them in library after build.
# Reproducer
```
#includ…