-
Hello
I am running in the following machine.
CPU: 12th Gen Intel(R) Core(TM) i7-12700
RAM: 32GB, speed: 4400MT/s
NVIDIA RTX A2000 12GB
model is:
llama-2-7b-chat.Q6_K.gguf
And it takes a…
-
While trying to run ./examples/talk , I'm getting this error:
```
charon@charon:~/coding/open-source/not-contributing/whisper.cpp$ ./talk -p Sanata
whisper_init_from_file_with_params_no_state: loa…
-
We select the backend at build time by selecting CUDA, Vulkan, SYCL, etc. Wouldn't it be better if you build with the backends you want to support and then select the backend at runtime? It's literall…
-
![image](https://user-images.githubusercontent.com/58569691/226115306-4b5bd805-2492-4dca-93a5-8d3775b85cb7.png)
After running the command it loads the model and then does nothing and exits. Not sure …
-
Is T5 (mLongT5, FlanT5 etc.) being developed with GGML ? We are looking towards this port for encoder-decoder LLM.
Thanks in advanced for great works.
Steve
-
Followed this guide : https://github.com/ggerganov/ggml/tree/master/examples/mpt
When tried to cast model to fp-16, got following error :
* Loading part: pytorch_model-00001-of-00007.bin
Traceback…
-
### Summary
LLM is a hot topic, there are more and more frameworks to make the execution of LLM faster. WasmEdge already integrated the [llama.cpp](https://github.com/ggerganov/llama.cpp) as one of…
hydai updated
9 months ago
-
This is a big one
The only reason we use BLAS is that we don't have efficient implementation of `matrix x matrix` multiplication. Naively doing parallel dot products is not optimal. We need to impl…
-
Thanks for your great work. The text2image mode works fine, but I met an error when using image2image mode. Any suggestions?
```bash
(mlc)- stable-diffusion.cpp % ./cmake-build-debug/bin/sd --mode i…
-
### .env
```
# Generic
TEXT_EMBEDDINGS_MODEL=sentence-transformers/all-MiniLM-L6-v2
TEXT_EMBEDDINGS_MODEL_TYPE=HF # LlamaCpp or HF
USE_MLOCK=false
# Ingestion
PERSIST_DIRECTORY=db
DOCUMENTS…