llama-3-2 Search Results

1000+ results
for llama-3-2

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

evilsocket/cake #34

Error: cannot find tensor model.layers.0.model.layers.0.self…

Hello developers, I was using cake to deploy distributed LLAMA3 8B Instruct model to 2 GPUs and I got the error below: ``` CUDA_VISIABLE_DEVICES=0 ./target/release/cake-cli --model ~/.cache/hugging…

VincentXWD updated 3 weeks ago
1
ErikBjare/gptme #302

Remaining issues with `tools` API

Things left to do after merging #300 # OpenAI It seems to work ok with OpenAI in my limited testing. # OpenRouter I tested it with some models via OpenRouter and noticed it sometimes gets …

ErikBjare updated 14 hours ago
3
vllm-project/vllm #10062

[Performance]: Throughput and Latency degradation with a si…

### Proposal to improve performance _No response_ ### Report of performance regression _No response_ ### Misc discussion on performance --- **Setup Summary for vLLM Benchmarking with Llama…

kaushikmitr updated 1 week ago
9
ggerganov/llama.cpp #9663

Feature Request: Add Support for MllamaForConditionalGenerat…

### Prerequisites - [X] I am running the latest code. Mention the version if possible as well. - [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md)…

manishkumart updated 3 days ago
10
jupyterlab/jupyter-ai #1114

Support cross-region inference for Amazon Bedrock

## Description [Cross-region inference (CRI)](https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html) allows requests to be automatically routed within any set of region…

dlqqq updated 1 week ago
2
sgl-project/sglang #2271

[Kernel] Optimize triton decoding kernels for long context

We noticed the current triton decoding kernel is very slow on long context. This is due to a missing flash decoding like optimization. ## Reproduce We test the decoding speed with a context length…

merrymercy updated 3 days ago
1
intel-analytics/ipex-llm #12411

Update Ollama with IPEX-LLM to a newer version

Hello. It seems that the latest Ollama with IPEX-LLM version `(0.3.6)` is a little old nowadays. It doesn't have proper support for new and popular models like: 1) `Phi 3.5` 2) `Qwen 2.5` 3…

NikosDi updated 2 weeks ago
1
huggingface/transformers.js #967

Llama 3.2 conversion error - onnx.ModelProto exceeds maximum…

### System Info requirements file: ``` transformers[torch]==4.44.2 onnxruntime

jonas-elias updated 1 month ago
1
meta-llama/llama-models #219

Incorrect token id for `<|image|>` token

In this repo the Llama3 tokenizer sets the `` special token to `128011` https://github.com/meta-llama/llama-models/blob/ec6b56330258f6c544a6ca95c52a2aee09d8e3ca/models/llama3/api/tokenizer.py#L79-L101…

vancoykendall updated 2 weeks ago
3
ggerganov/llama.cpp #10252

Bug: CANN: Inference result garbled

### What happened? llama.cpp使用QWen2.5-7b-f16.gg在310P3乱码 ### Name and Version ./build/bin/llama-cli -m Qwen2.5-7b-f16.gguf -p "who are you" -ngl 32 -fa ### What operating system are you seeing the …

feichenchina updated 5 days ago
9

上一页 1...19 20 21 22 23 24 25...100 下一页

1000+ results for llama-3-2

1000+ results
for llama-3-2