gpt2-inference-performance Search Results

236 results
for gpt2-inference-performance

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

ggerganov/llama.cpp #8730

Bug: Llama 3.1 might not be fully supported yet

### What happened? Llama 3.1 8B quantized after https://github.com/ggerganov/llama.cpp/pull/8676 fails the "wicks" problem that LLama 3 8B can answer correctly. Prompt: `Making one candle requir…

Azirine updated 2 months ago
17
intel/intel-extension-for-transformers #69

Exporting Gpt2 model failing for `trainer.export_to_onnx` wi…

Hi intel team, I have pruned and quantized several models using your toolkit, and I'm currently aiming to do inference using your pipeline to my gpt2 code generation model. To do so I need to expor…

DSochirca updated 7 months ago
1
ggerganov/llama.cpp #8186

Bug: The inference speed of building with HIPBLAS (gfx1100) …

### What happened? I use 7900xtx, only 3~t/s when I use llama.cpp inference qwen2-7b-instruct-q5_k_m.gguf, even if I set -ngl 1000 or -ngl 0, I still find that the VRAM usage of the GPU is very low, …

Lookforworld updated 3 months ago
10
TransformerLensOrg/TransformerLens #570

[Bug Report] Mixtral generates nonsense

**Describe the bug** ![Screenshot 2024-05-04 at 4 41 10 AM](https://github.com/neelnanda-io/TransformerLens/assets/310981/69c34618-015f-4cd9-9ed6-4e0b295982e9) I followed the instructions in `docs…

joelburget updated 3 months ago
42
ollama/ollama #4139

only 1 GPU found -- regression 1.32 -> 1.33

### What is the issue? Hi everyone, Sorry I don't have much time to write much; but going from 1.32 to 1.33, this: ``` ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes ggml_cuda_init: CUDA_USE_TENS…

AlexLJordan updated 2 weeks ago
24
huggingface/transformers #21913

[performance] from_pretrained is still much slower than torc…

### System Info - `transformers` version: 4.26.1 - Platform: Linux-5.15.0-52-generic-x86_64-with-glibc2.31 - Python version: 3.10.9 - Huggingface_hub version: 0.12.1 - PyTorch version (GPU?): 2.0…

moyix updated 4 months ago
15
ggerganov/llama.cpp #8685

Bug: (CUDA) Corrupted output when offloading to multiple GPU…

### What happened? ### Problem Some models produce a corrupted output when offloading to multiple CUDA GPUs. The problem disappears when offloading to a single GPU or using CPU only. I was able…

matteoserva updated 2 months ago
22
ggerganov/llama.cpp #7062

Llama3 GGUF conversion with merged LORA Adapter seems to los…

I'm running Unsloth to fine tune LORA the Instruct model on llama3-8b . 1: I merge the model with the LORA adapter into safetensors 2: Running inference in python both with the merged model direct…

Sneakr updated 5 months ago
147
stanford-crfm/helm #1552

Inference_runtime does not show on Efficiency Metrics of Web…

After running several inference case, the output folder under stats.json which collected 3 infererence_runtime successfully. But it does not on the browser http://localhost:8000/ under Efficiency Metr…

yidinghabana updated 9 months ago
8
ggerganov/llama.cpp #6747

llama3 family support

llama3 released would be happy to use with llama.cpp https://huggingface.co/collections/meta-llama/meta-llama-3-66214712577ca38149ebb2b6 https://github.com/meta-llama/llama3

gulldan updated 3 months ago
79

上一页 1...10 11 12 13 14 15 16...24 下一页

236 results for gpt2-inference-performance

236 results
for gpt2-inference-performance