Closed mirek190 closed 1 year ago
@Josh-XT i expect this is low priority - dunno if we want support all ext libs. Trying to think about excluding these into a extra-package (subpackage of agixt) to separate responsibilities.
@mirek190 As a quickfix, try running text-generation-webui (oobabooga), load the model there and connect via oobabooga provider on AGiXT
Oobabooga is using gpu for models so you will not be able to use big models. I want to use my CPU for it ( llama.cpp is most advanced and really fast especially with ggmlv3 models ) as I can run much bigger models like 30B 5bit or even 65B 5bit which are far more capable in understanding and reasoning than any one 7B or 13B mdel. For instance witch rtx 3080 and llama.cpp you can run 65B ggmlv3 q4 models with more than half layers on GPU and the rest on CPU and getting 6 tokens/s !
65B bit q4 model beat any model ... they are not even close like 7B,13B or 30B. Is very close to chatgpt 3.5 in reasoning...or even sometimes beat it especially gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1.bin which is closer to gpt4.
Do you understand how big progress is llama.cpp comparing to others projects? :) What why I want to be supported in this project .
Not a low priority - just trying to get through bug fixes currently. I just removed the version cap for llama-cpp-python
so that the latest can be used.
pip install llama-cpp-python --upgrade
Please let me know if there are additional flags or features that I should make available for this provider.
--mlock
--threads
--batch_size
--n_predict
--top_k
--top_p
--temp
--repeat_penalty
--ctx_size
--n-gpu-layers
Those ones are most important
And should have support to cublas.
That speed up prompt handling x3-x4 times for me.
Sorry, I haven't been keeping up on the llamacpp changes, but I have heard they're amazing! I mostly use OpenAI for all of my testing currently just for the simple speed and reliability of it. I fully intend to switch fully to local models once I can run the 8k+ context models locally (which I should be able to now with llamacpp, just been busy.)
Here is the module we use:
https://github.com/abetlen/llama-cpp-python
If you can confirm they have the features there, I can add anything necessary. I think the gpu layers is new and is in the llamacpp python module. I can add that as an agent setting.
I will check that module later and let you know 👍
Thanks for you hard work .
Merging #431 to hopefully resolve this. Please try it out and let me know how it goes!
if self.ctx is not None:
AttributeError: 'Llama' object has no attribute 'ctx'
Exception ignored in: <function Llama.del at 0x00000183015FB250>
Traceback (most recent call last):
File "C:\Users\mirek190\AppData\Roaming\Python\Python310\site-packages\llama_cpp\llama.py", line 1219, in del
if self.ctx is not None:
AttributeError: 'Llama' object has no attribute 'ctx'
Exception ignored in: <function Llama.del at 0x00000183015FB250>
Traceback (most recent call last):
File "C:\Users\mirek190\AppData\Roaming\Python\Python310\site-packages\llama_cpp\llama.py", line 1219, in del
if self.ctx is not None:
AttributeError: 'Llama' object has no attribute 'ctx'
2023-05-21 18:07:20.519 Uncaught app exception
Traceback (most recent call last):
File "C:\Users\mirek190\AppData\Roaming\Python\Python310\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 565, in _run_script
exec(code, module.dict)
File "F:\LLAMA\AGiXT\agixt\pages\Chat.py", line 82, in
self.params.n_gpu_layers = n_gpu_layers <-- must be integer
AttributeError: 'Llama' object has no attribute 'ctx' <-- should be " n_ctx"?
From llama.py
"""Load a llama.cpp model from `model_path`.
Args:
model_path: Path to the model.
n_ctx: Maximum context size.
n_parts: Number of parts to split the model into. If -1, the number of parts is automatically determined.
seed: Random seed. 0 for random.
f16_kv: Use half-precision for key/value cache.
logits_all: Return logits for all tokens, not just the last token.
vocab_only: Only load the vocabulary no weights.
use_mmap: Use mmap if possible.
use_mlock: Force the system to keep the model in RAM.
embedding: Embedding mode only.
n_threads: Number of threads to use. If None, the number of threads is automatically determined.
n_batch: Maximum number of prompt tokens to batch together when calling llama_eval.
last_n_tokens_size: Maximum number of tokens to keep in the last_n_tokens deque.
lora_base: Optional path to base model, useful if using a quantized base model and you want to apply LoRA to an f16 model.
lora_path: Path to a LoRA file to apply to the model.
verbose: Print verbose output to stderr.
self.verbose = verbose self.model_path = model_path
self.params = llama_cpp.llama_context_default_params()
self.params.n_ctx = n_ctx
self.params.n_parts = n_parts
self.params.n_gpu_layers = n_gpu_layers
self.params.seed = seed
self.params.f16_kv = f16_kv
self.params.logits_all = logits_all
self.params.vocab_only = vocab_only
self.params.use_mmap = use_mmap if lora_path is None else False
self.params.use_mlock = use_mlock
self.params.embedding = embedding
Newest llama.cpp executable build has API now.
https://github.com/ggerganov/llama.cpp/releases/tag/master-7e4ea5b
https://github.com/ggerganov/llama.cpp/tree/master/examples/server
Working on this in #446 . If you have the API server running, you're welcome to try it.
This was fixed.
I tried to use server llama.cpp but without a success ... Any guide how to use it here?
I tried to use server llama.cpp but without a success ... Any guide how to use it here?
Don't use the llamacppapi
one, just use the llamacpp
one. The api one is still in progress, I haven't been able to run the llamacpp server yet myself to test that one entirely.
If it helps to know, these are my settings for my working Vicuna 13B with my llamacpp agent.
{
"commands": {},
"settings": {
"provider": "llamacpp",
"AI_MODEL": "vicuna",
"AI_TEMPERATURE": "0.4",
"MAX_TOKENS": "2000",
"embedder": "default",
"MODEL_PATH": "/home/josh/josh/Repos/ggml-vicuna-13b-1.1/ggml-vic13b-uncensored-q5_1.bin",
"GPU_LAYERS": "40",
"BATCH_SIZE": "512",
"THREADS": "24",
"STOP_SEQUENCE": "</s>",
"SEARXNG_INSTANCE_URL": "https://searx.work",
"HUGGINGFACE_AUDIO_TO_TEXT_MODEL": "facebook/wav2vec2-large-960h-lv60-self",
"USE_BRIAN_TTS": "True",
"ELEVENLABS_VOICE": "Josh",
"SELENIUM_WEB_BROWSER": "chrome",
"DISCORD_COMMAND_PREFIX": "/AGiXT",
"WORKING_DIRECTORY": "./WORKSPACE",
"WORKING_DIRECTORY_RESTRICTED": "True",
"": ""
}
}
Description
AGiXT - llama.cpp - is not supporting ggmlv2 models or q5_1 ( 5 bit ). Those llama.cpp models ggmlv2 are even obsolete now because llama.cpp has ggmlv3 models already ...
Another question - can I add llama.cpp parameter somehow? For instance -ngl ( GPU support ) or cubas ( prompt via GPU as well )
llama.cpp: loading model from models/wizardLM-7B-uncensored-ggmlv2-q5_1.bin error loading model: unknown (magic, version) combination: 67676a74, 00000002; is this really a GGML file? llama_init_from_file: failed to load model INFO: 127.0.0.1:4770 - "GET /api/agent/Wizard/command HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "C:\Users\mirek190\AppData\Roaming\Python\Python310\site-packages\uvicorn\protocols\http\httptools_impl.py", line 435, in run_asgi result = await app( # type: ignore[func-returns-value] File "C:\Users\mirek190\AppData\Roaming\Python\Python310\site-packages\uvicorn\middleware\proxy_headers.py", line 78, in call return await self.app(scope, receive, send) File "C:\Users\mirek190\AppData\Roaming\Python\Python310\site-packages\fastapi\applications.py", line 276, in call await super().call(scope, receive, send) File "C:\Users\mirek190\AppData\Roaming\Python\Python310\site-packages\starlette\applications.py", line 122, in call await self.middleware_stack(scope, receive, send) File "C:\Users\mirek190\AppData\Roaming\Python\Python310\site-packages\starlette\middleware\errors.py", line 184, in call raise exc File "C:\Users\mirek190\AppData\Roaming\Python\Python310\site-packages\starlette\middleware\errors.py", line 162, in call await self.app(scope, receive, _send) File "C:\Users\mirek190\AppData\Roaming\Python\Python310\site-packages\starlette\middleware\cors.py", line 91, in call await self.simple_response(scope, receive, send, request_headers=headers) File "C:\Users\mirek190\AppData\Roaming\Python\Python310\site-packages\starlette\middleware\cors.py", line 146, in simple_response await self.app(scope, receive, send) File "C:\Users\mirek190\AppData\Roaming\Python\Python310\site-packages\starlette\middleware\exceptions.py", line 79, in call raise exc File "C:\Users\mirek190\AppData\Roaming\Python\Python310\site-packages\starlette\middleware\exceptions.py", line 68, in call await self.app(scope, receive, sender) File "C:\Users\mirek190\AppData\Roaming\Python\Python310\site-packages\fastapi\middleware\asyncexitstack.py", line 21, in call raise e File "C:\Users\mirek190\AppData\Roaming\Python\Python310\site-packages\fastapi\middleware\asyncexitstack.py", line 18, in call await self.app(scope, receive, send) File "C:\Users\mirek190\AppData\Roaming\Python\Python310\site-packages\starlette\routing.py", line 718, in call await route.handle(scope, receive, send) File "C:\Users\mirek190\AppData\Roaming\Python\Python310\site-packages\starlette\routing.py", line 276, in handle await self.app(scope, receive, send) File "C:\Users\mirek190\AppData\Roaming\Python\Python310\site-packages\starlette\routing.py", line 66, in app response = await func(request) File "C:\Users\mirek190\AppData\Roaming\Python\Python310\site-packages\fastapi\routing.py", line 237, in app raw_response = await run_endpoint_function( File "C:\Users\mirek190\AppData\Roaming\Python\Python310\site-packages\fastapi\routing.py", line 163, in run_endpoint_function return await dependant.call(values) File "F:\LLAMA\AGIXT\AGiXT\src\agixt\app.py", line 221, in get_commands commands = Commands(agent_name) File "F:\LLAMA\AGIXT\AGiXT\src\agixt\Commands.py", line 13, in init self.CFG = Agent(self.agent_name) File "F:\LLAMA\AGIXT\AGiXT\src\agixt\Config\Agent.py", line 28, in init self.PROVIDER = Provider(self.AI_PROVIDER, self.PROVIDER_SETTINGS) File "F:\LLAMA\AGIXT\AGiXT\src\agixt\provider__init.py", line 24, in init self.instance = provider_class(*kwargs) File "F:\LLAMA\AGIXT\AGiXT\src\agixt\provider\llamacpp.py", line 30, in init self.model = Llama(model_path=MODEL_PATH, n_ctx=self.MAX_TOKENS 2) File "C:\Users\mirek190\AppData\Roaming\Python\Python310\site-packages\llama_cpp\llama.py", line 159, in init__ assert self.ctx is not None AssertionError
Steps to Reproduce the Bug
load model
Expected Behavior
Working model
Actual Behavior
Error?
Additional Context / Screenshots
No response
Operating System
Python Version
Environment Type - Connection
Environment Type - Container
Acknowledgements