-
there is an error when trying to load the model
the error is in the model itself checkpoint = torch.load(local_embedding_path, map_location="cpu")['weight']
this apparently expects embed_ll…
-
why does some outputs look like this:
```
Moviepy - Done !
Moviepy - video read…
-
I have tried your llama example and the output is **random** and took 770 second to finish:
**commmand :**
```
python src/run_generation.py --model_type llama --model_name_or_path meta-llama/Ll…
-
Hello, @b4rtaz!
I'm trying to run model [nkpz/llama2-22b-chat-wizard-uncensored](https://huggingface.co/nkpz/llama2-22b-chat-wizard-uncensored) on a cluster composed of 1 Raspberry Pi 4B 8 Gb and 7…
-
hi,
I am trying to convert the llama2 7b model by below script.
python export_meta_llama_bin.py ~/projects/75_NLP/llama-main/llama-2-7b llama2_7b.bin
it always popup "killed" message.
My hardwa…
-
### 📚 The doc issue
https://github.com/pytorch/executorch/tree/main/examples/qualcomm
- On the 3) qaihub_scripts, it still mentioned llama2. Can we update those to reflect the latest support for l…
-
/kind feature
**Describe the solution you'd like**
To autoscale LLM inference services Knative's request level metrics may not be the best scaling metrics as LLM inference is performed at the toke…
-
Hello,
I can't run step 4 from the instruction that is available on https://github.com/pytorch/executorch/tree/main/examples/models/llama2
When I run point _2. Build llama runner._ I have an error…
-
Knowing that Ollama server supports OpenAI API([https://ollama.com/blog/openai-compatibility](https://ollama.com/blog/openai-compatibility)), the goal is to **point Cursor to query the local Ollama se…
-
### System Info
- CPU archtecture: x86_64
- CPU/Host memory size: 250GB total
- GPU properties
- GPU name: 2x NVIDIA A100 80GB
- GPU memory size: 160GB total
- Libraries
- tensorrt @ fi…