-
Hi I finally got it working and I'm going to share my step by step to make this work.
#### My system:
RTX 3060 12GB
CUDA 12.1
Windows 10
PHPSTORM 2023.2.4
## Step 1 - Install TGI
Follow the…
-
It would be _really_ neat if Willow could be configured with custom / user defined integrations.
For example, I want to send the text that willow infers from voice directly to a locally run LLM (AI…
-
Model: https://huggingface.co/bartowski/DeepSeek-Coder-V2-Lite-Instruct-GGUF
I tested quant Q5_K_M
At the very default koboldcpp_cu12 v1.68, CuBLAS with 0 layers, no flash attention.
Prompt: (p…
-
There is a model called SOLAR. This model follows the same architecture as LLaMA2, but it has more layers which make it outstanding performer better than Mistral and even Mixtral at some points (open …
-
**Description**
I am deploying a 7b open-source LLM on a triton server with 32gi memory, 8 cpu and 2 T4 GPU. I also have some other code deployed as models(doesnt take much cpu/memory) in the same co…
-
By supporting Ollama it would be possible to use locally hosted LLMs which, would be quite privacy friendly. I think this would pair quite nicely with Grafana’s mission.
https://github.com/jmorganc…
ntimo updated
5 months ago
-
[Wizardcoder](https://huggingface.co/WizardLM/WizardCoder-15B-V1.0) is the best open source code LLM recently. It seems base on Starcoder. How can I finetune this model?
-
python build.py --model_dir ./bloom/560M/ --dtype float16 --use_gemm_plugin float16 --use_gpt_attention_plugin float16 --output_dir ./bl…
-
## Why do you need it?
**Who will use**:
Developers who start businesses based on the LLM model (通义千问/openai/gemini).
~
**目标用户**:
希望基于LLM模型创造商业价值的开发者。
---
**What problem to solve*…
-
cat .env
LLM_NAME="Ollama"
OLLAMA_MODEL_NAME="qwen:7b"
OLLAMA_BASE_URL="http://192.168.2.205:11434"
MIN_RELEVANCE_SCORE=0.3
BOT_TOPIC="OpenIM"
URL_PREFIX="http://192.168.2.205:11434"
USE_PREPRO…