-
## Type of issue
- Thanks guys for this awesome work. I was curious to run llama3-8B on my personal CPU, and the performance is quite impressive (nearly 2x llama.cpp for same model size on same HW).
…
-
# Bitnet 1.58 Groundwork
After some talks with Saroufim and the cuda mode team working on bitnet, we've outlined a strategy for implementing bitnet 1.58 method into torch. This issue lays the groun…
-
First of all: CONGRATS ON YOUR AMAZING RESEARCH WORK.
Considering that this is using GGML and seems based directly on `llama.cpp`:
Why is this a separate project to `llama.cpp`, given that `llama.c…
-
Some of the most popular models provide weights in bfloat16, which unfortunately can not load on CPU because `Matmul::eval_cpu` only supports float32.
I know CPU support is not on priority, but it …
-
I am developing [llmchat.co](llmchat.co), an open source local first chat interface. We do have integrations with Ollama, and LM Studio but one of the biggest hurdles that our initial users are telli…
-
A web interface designed for submitting queries and viewing real-time responses through a user-friendly UI. Built with Node.js for the frontend and a Python Socket server for backend processing, the s…
-
The readme states some requirements about python, cmake and clang version.
Currently the install/build process does not check if the clang version requirement is satisfied and ubuntu e.g. come with a…
-
Hello.
First of all, thanks for sharing a bitnet training code.
I have a question about GPU memory usage.
As I understanding, bitnet can reduce VRAM usage compared to fp16/bf16 precision.
Howev…
-
Seems like an absolutely awesome project. I do a lot of domain expert LLM finetuning so this would be amazing to have in my work. What has to be done to get this into common inference engines like lcp…
-
The [Training Tips, Code and FAQ](https://github.com/microsoft/unilm/blob/master/bitnet/The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf) specifies that `BitLinear` has different `forward()` definiti…