-
Using the instructions here: https://github.com/ray-project/ray-llm#how-do-i-deploy-multiple-models-at-once I'm trying to host two models on a single A100 80G.
Two bundles are generated for the pla…
-
I tried to add LM_STUDIO internal server as a model to be in the options, and I only tried it with 2000 context using the google gemma 7b model, I didn't get any results, even upping the number of tok…
-
Can someone please help me understand what I'm doing wrong here?
(llama_env) C:\Users\afull>torchrun --nproc_per_node 1 example_completion.py \
NOTE: Redirects are currently not supported in Windo…
-
# Problem
* Until we can solve the problem of the 403 access #676, there's no way to pull models from the Ollama server
* At the time I'm writing this, I don't think the Ollama registry (Docker …
-
In the Gemma 7b notebook, when rslora and dora are active, and the settings for 4-bit and 8-bit are off with r=8 and alpha=16, I encounter an error as described below. I have targeted all linear layer…
-
### Description
Instead of downloading the models from HF, the services should fetch the weights from Torrent.
### Dependencies
- This [implementation](https://github.com/premAI-io/from-hf-t…
-
I've downloaded ollama I'm not sure what i'm expecting to happen I've pulled the model locally. There is no guidance on what is expected to happen or how to use?
Is it supposed to run on save is t…
-
First load model with internet connection ON
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/gemma-7b-bnb-4bit",
max_seq_length = max_seq_length,
dtype = dt…
-
Instead of using chat-gpt, I would like to try and use a local LLM. I am sure this would take some modifications, but I think we could potentially make this work, and would be an awesome addition to t…
-
The command `python3 torchchat.py where llama3` fails quietly presumably because I might not have the HF Token configured.
I assumed the code was broken, though because I got a backtrace of the pr…