-
This is needed if possible to tweak the inference code. The whole thing doesn't fit in the VRAM for even nvidia 4090 and takes forever.
-
I saw the line `pipe.enable_model_cpu_offload()` here https://github.com/instantX-research/InstantIR/blob/main/pipelines/sdxl_instantir.py#L113C13-L113C44 and tried the approach with the gradio app, b…
-
### Your current environment
N/A
### Model Input Dumps
_No response_
### 🐛 Describe the bug
When we use cpu offloading together with `torch.compile`, it will error:
```text
torch._dynamo.exc.…
-
I'm trying omnitrace with OpenMP offloading for a small fortran test code. Depending on which system I tested on I encountered different issues. The test code is compiled using the HPE Cray compiler, …
-
### What is the issue?
Ollama allocates same size of host (CPU) RAM to what the model requires in VRAM when loading the model. If my understanding is correct, it's not actuallly using these RAM. (P…
-
I am running the full finetune distributed recipe, when setting `clip_grad_norm: 1.0` and `fsdp_cpu_offload: True`, it raises error
`RuntimeError: No backend type associated with device type cpu`
…
-
We are using the `OffloadActivations` context manager with separate streams and pinned memory but currently don't see any overlap between the streams.
The default stream has a D2H transfer (which i…
-
- https://www.ralfj.de/blog/
- https://noidea.dog/glue
- https://xlinux.nist.gov/dads/
- https://blog.sulami.xyz/posts/what-is-in-a-rust-allocator/
- https://quickwit.io/blog/performance-investiga…
-
## Describe This Problem
We found in production that the speed of sst compaction is unable to keep up with the speed of sst generation, leading to poor query performance. However we are unable give…
-
### Describe the bug
Sequential offloading doesn't work when using `pytest`, but does seem to work outside of tests.
This is an issue, because we can't properly test sequential offloading on Stabl…