-
I followed this guide: https://developer.apple.com/metal/pytorch/
I can use comfyui with hardware offloading using MPS.
OmniGen errors out with:
```
No avaliable GPU, offload_kv_cache wiil be se…
-
There's a new cache technique mentioned in the paper https://arxiv.org/abs/2312.17238. (github: https://github.com/dvmazur/mixtral-offloading)
They introduced LRU cache to cache experts based on patt…
-
- https://www.ralfj.de/blog/
- https://noidea.dog/glue
- https://xlinux.nist.gov/dads/
- https://blog.sulami.xyz/posts/what-is-in-a-rust-allocator/
- https://quickwit.io/blog/performance-investiga…
-
### Describe the bug
Sequential offloading doesn't work when using `pytest`, but does seem to work outside of tests.
This is an issue, because we can't properly test sequential offloading on Stabl…
-
It seems like pipelining could possibly greatly simplify the implementation of a feature such as fairscale's OffloadModel https://fairscale.readthedocs.io/en/latest/deep_dive/offload.html
Is this s…
-
**Problem**
Jan is great, but I'm limited o the number of models I can run on my 16GB GPU. I saw there is a project called [mixtral-offloading](https://github.com/dvmazur/mixtral-offloading) that cou…
-
**Describe the issue**
CKV_AWS_378 triggers on configurations which have HTTP targets. But in a lot of cases SSL is offloaded on the load balancer level, and further targets use HTTP protocol to inte…
-
### Is your feature request related to a problem? Please describe
We removed the misleading indicator `CPU` on the Model tables. But it would be interesting for the user to have some indication if th…
-
Hello,
I realize, this is probably a significant feature request as it would need pretty big modification of the W5500 driver.
**Is your enhancement proposal related to a problem? Please describe…
-
Hi, thanks for the great library! I have heard some people saying EXL2 being very fast, but I would like to try the 70B llama model on a 24GB 4090 card, so it cannot be fit into the GPU using e.g. 4bi…