-
I saw the line `pipe.enable_model_cpu_offload()` here https://github.com/instantX-research/InstantIR/blob/main/pipelines/sdxl_instantir.py#L113C13-L113C44 and tried the approach with the gradio app, b…
-
### Problem Description
Hi,
This is more of a question than an issue.
Is there a way to disassemble the generated amdcgn code when compiling a fortran openmp program with target directives?
Curren…
-
I'm trying omnitrace with OpenMP offloading for a small fortran test code. Depending on which system I tested on I encountered different issues. The test code is compiled using the HPE Cray compiler, …
-
I am running the full finetune distributed recipe, when setting `clip_grad_norm: 1.0` and `fsdp_cpu_offload: True`, it raises error
`RuntimeError: No backend type associated with device type cpu`
…
-
-
### Prerequisites
- [X] I am running the latest code. Mention the version if possible as well.
- [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md)…
-
Llama.cpp seems to use CPU instead of the GPU (RTX3090) which makes the process very slow,
No matter the number of GPU layers I set, the model will always be offloaded to the CPU. Also, It's seems th…
-
## Describe This Problem
We found in production that the speed of sst compaction is unable to keep up with the speed of sst generation, leading to poor query performance. However we are unable give…
-
可以相对低资源的训练较大模型了,感谢大佬们
-
Hi! I wonder whether unsloth will support some kind of CPU offload?
For example, I would like to finetune a 7-8B model on 24GB gpu. Since LoRA usually results in reduced performance, it would be gr…