-
Running on 1xH100 with latest docker container from docker hub
```
>>> fast_pipe = optimum_pipeline('text-generation', 'meta-llama/Meta-Llama-3-8B-Instruct', use_fp8=True)
Special tokens have bee…
-
### Your current environment
Hey Team,
I was experimenting with class **LLM** using gptq_marlin on the GPU, and it is incredibly fast. However, when I tried running it on the CPU, it seems that …
-
Hi,
I saved the LLAVA model in 4bit using generate.py from:
https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/PyTorch-Models/Model/llava
model = optimize_model(model) …
-
Hello,
I used latest steps to install ipex-llm into a venv on a 5th Gen Xeon system. I don't think AMX is being utilized based on screenshot below. Should AMX show up in list of CPU features in o…
-
python/llm/example/GPU/Deepspeed-AutoTP/run_qwen_14b_arc_2_card.sh
python/llm/example/GPU/Deepspeed-AutoTP/run_vicuna_33b_arc_2_card.sh
python/llm/dev/benchmark/all-in-one/run-deepspeed-arc.sh
Cu…
-
### What happened?
The latest llama.cpp produces bad outputs for CodeShell, which previously performed well when merged into llama.cpp.
After updating `convert-hf-to-gguf.py` and `convert-hf-to-g…
-
### System Info
- CPU architecture x86_64
- Host memory size 32Gb
- GPU Nvidia RTX 2060
- GPU memory size 12 Gb
- TensorRT-LLM v0.10.0
### Who can help?
_No response_
### Information
- [ ] Th…
-
I cannot run Qwen2-7B-instruct quantized version locally. System keep notifying about MemoryError which seems quite strange. The same problem does not happen with other models such as Mistral-7B-instr…
-
Hi, having this issue with connecting to external llms.
Enviroment server for remote LLM:
- Amd 79503xd
- 64 GB RAM
- 2x 7900xtx
- Using LM-STUDIO fosr hosting LLM server
Enviroment Cli…
-
**Describe the bug**
After change configuration in config.yaml. Run 'ilab xxx --help' , the default is not consistent with config.yaml. E.g. change default serve model to mixtral, the help message st…