-
Tried tensorflow and torch with tinygrad
still getting this error with llama 8b 3.1 and llama 8b as well.
Apparently this is an opencl compile error for bfloat16 data type
Sorry, I am not a kernel …
-
### Is there an existing integration?
- [x] I have searched the existing integrations.
### Use Case
This feature would allow users to seamlessly integrate Modal's infrastructure for both inference …
mahzy updated
2 months ago
-
This only happens with `BEAM=1`. `BEAM=0`, `BEAM=2`, `BEAM=3` all work fine
This happens because exo runs tinygrad inference on another thread.
Example command to reproduce: `DEBUG=6 BEAM=1 python3 …
-
### Search before asking
- [X] I have searched the YOLOv8 [issues](https://github.com/ultralytics/ultralytics/issues) and found no similar bug report.
### YOLOv8 Component
_No response_
### Bug
…
-
### Your current environment
```text
The output of `python collect_env.py`
```
### How would you like to use vllm
I am currently running Qwen2.5-72b-instruct on a DGX PCIE server with VLLM as t…
-
### System Info
GPU: Nvidia H100
Model: Llama3 8B
### Who can help?
@kaiyux
### Information
- [x] The official example scripts
- [ ] My own modified scripts
### Tasks
- [ ] An officially suppo…
-
Hi,
I have a multi node setup with multiple GPU. I was able to get the cluster but I don't see the remaining GPU's from each nodes. How do I do that. Also observed below error while using llama…
-
I am instantiating an LLM class for local inference. I noticed that when an OOM error happens in `vllm.LLM.llm_engine.step()` and I capture it, previous requests are not aborted and would mess up with…
-
I built the engine, and had two separate LoRA layers with the base llama3.1 model. The output from the build is rank0.engine, config.json, and then a lora folder with the following structure:
lora
|
|…
-
Since, [ApproxBayes.jl](https://github.com/marcjwilliams1/ApproxBayes.jl) was integrated here there are a few more ABC engines available that are more fully-featured and likely better supported at pre…