-
support showing history of memory CPU bandwidth usage
-
Hi thanks for the library! This is like a discussion (instead of an issue). It seems that when using unsloth or huggingface Trainer to full finetune ~1B model, the gpu utilization is >90%, while memor…
-
我在 RTX 4090 工作站上运行 banchmark 程序,取得了异常高的总线带宽数据:
``` bash
Pytorch version : 1.14.0a0+44dac51
CUDA version : 12.0
GPU : NVIDIA GeForce RTX 4090
Matrix Multiplication:
…
-
This will enable us to drop the emissive buffer via a debug setting and the feature table to save us some extra memory bandwidth on low end machines.
## Test Plan
To test this:
- Set the "RenderDisa…
-
Create a benchmark for testing cpu - memory bandwidth and compare that with a gpu solution.
A simple start is to test neuron only functions because they should be independent so easily parallelizable…
-
Hi thanks for the library! I have a naive thought: We know deep learning forward/backward cannot be parallelized, because you have to compute one operation/layer before computing the next one. But wha…
-
**Is your feature request related to a problem? Please describe.**
One thing that needs to be checked before trying to parallelize a code is to see whether or not memory bandwidth has already been sa…
-
### Area
- [X] Scheduler
- [ ] Controller
- [ ] Helm Chart
- [ ] Documents
### Other components
_No response_
### What happened?
while using the plugin trimaran following error appears in log:
E…
-
**Is this a bug report or feature request?**
* Bug Report
**Deviation from expected behavior:**
Low Performance (~66 IO/s per osd; see benchmark details)
**Expected behavior:**
When s…
-
I am encountering issues when using non-element-wise optimizers such as Adam-mini with DeepSpeed.
According to the documentation, it reads:
> The FP16 Optimizer is designed to maximize the achievable…