llm-cpu Search Results - Githubissues

huggingface/optimum-nvidia #143

Error on Quickstart example

Running on 1xH100 with latest docker container from docker hub ``` >>> fast_pipe = optimum_pipeline('text-generation', 'meta-llama/Meta-Llama-3-8B-Instruct', use_fp8=True) Special tokens have bee…

laikhtewari updated 1 day ago

vllm-project/vllm #5664

[Usage]: Does class LLM support inference quantization on CP…

### Your current environment Hey Team, I was experimenting with class **LLM** using gptq_marlin on the GPU, and it is incredibly fast. However, when I tried running it on the CPU, it seems that …

rsong0606 updated 2 days ago

intel-analytics/ipex-llm #11341

error with ipex-llm langchain integration for LLAVA model

Hi, I saved the LLAVA model in 4bit using generate.py from: https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/PyTorch-Models/Model/llava model = optimize_model(model) …

tsantra updated 3 days ago

intel-analytics/ipex-llm #11496

Determining if AMX is in use by ollama

Hello, I used latest steps to install ipex-llm into a venv on a 5th Gen Xeon system. I don't think AMX is being utilized based on screenshot below. Should AMX show up in list of CPU features in o…

js333031 updated 1 day ago

intel-analytics/ipex-llm #11455

Need modify Deepspeed test script to support Xeon CPU

python/llm/example/GPU/Deepspeed-AutoTP/run_qwen_14b_arc_2_card.sh python/llm/example/GPU/Deepspeed-AutoTP/run_vicuna_33b_arc_2_card.sh python/llm/dev/benchmark/all-in-one/run-deepspeed-arc.sh Cu…

oldmikeyang updated 1 week ago

ggerganov/llama.cpp #8250

Bug: CodeShell inference not working correctly

### What happened? The latest llama.cpp produces bad outputs for CodeShell, which previously performed well when merged into llama.cpp. After updating `convert-hf-to-gguf.py` and `convert-hf-to-g…

chiranko updated 3 days ago

NVIDIA/TensorRT-LLM #1855

Errors in code for importing llama model from Huggingface

### System Info - CPU architecture x86_64 - Host memory size 32Gb - GPU Nvidia RTX 2060 - GPU memory size 12 Gb - TensorRT-LLM v0.10.0 ### Who can help? _No response_ ### Information - [ ] Th…

ivodopyanov updated 1 week ago

abetlen/llama-cpp-python #1542

Qwen-7B-Instruct Model numpy.core._exceptions._ArrayMemoryEr…

I cannot run Qwen2-7B-instruct quantized version locally. System keep notifying about MemoryError which seems quite strange. The same problem does not happen with other models such as Mistral-7B-instr…

khoinpd0411 updated 1 week ago

dnhkng/GlaDOS #73

Can't connect to external LLM servers

Hi, having this issue with connecting to external llms. Enviroment server for remote LLM: - Amd 79503xd - 64 GB RAM - 2x 7900xtx - Using LM-STUDIO fosr hosting LLM server Enviroment Cli…

faneQ123 updated 15 hours ago

instructlab/instructlab #1595

After update config.yaml, default values is wrong

**Describe the bug** After change configuration in config.yaml. Run 'ilab xxx --help' , the default is not consistent with config.yaml. E.g. change default serve model to mixtral, the help message st…

chudegao updated 10 hours ago

1000+ results for llm-cpu

1000+ results
for llm-cpu