-
Allow the user to disable certain passes in the optimizer based on their requirements.
-
Hi,
I have some question related to the paper:
1) Which FP8 format (E4M3 / E5M2) do you use for the First Adam moment? Do you use Delayed scaling or just-in-time scaling?
2) What about the weig…
-
# Describe the bug
A PSCollection should contain optimizer states besides weights. The optimizer states tensors are obtained directly [from EmbeddingCollection Module](https://github.com/pytorch/to…
-
### Feature request
Want to know if it is possible to implement [Prodigy](https://github.com/konstmish/prodigy) optimizer into bnb with 8bit support.
### Motivation
Prodigy is now widely used…
-
**Describe the bug**
Just like this PR: https://github.com/microsoft/DeepSpeed/pull/5259 , ZeRO optimizer also needs to be fixed:
1. partition logic of expert params.
3. average_tensor used in …
-
Hello, is there any code about the optimizer LBP? I would like to reproduce this work of yours, thank you very much! Because when I look at the first stage results, the results are always unsatisfacto…
-
您好!我想请问一下,在您的论文中我看到选择的是SGD作为Optimizer,我在使用的时候尝试使用Adam和AdamW Optimizer,但是没训练几轮loss就变成了NAN,不知道您这边有没有遇到过类似的问题,还是说就只是使用了SGD呢?
-
### Describe the issue
I directly export whisper models to ONNX model from whisper module. I wrote an inference script and the results are correct.
I want to reduce the runtime so I used the bart tr…
-
![image](https://github.com/OptimalFoundation/nadir/assets/11348086/324fe463-8dde-47ca-9c9a-09b1d0330874)
-
我再使用https://github.com/rockchip-linux/rknn-toolkit/releases项目进行rv1126的pt模型转rknn模型的时候。运行指令
python ../../../../../common/rknn_converter/rknn_convert.py --yml_path ./yolov5_6_7_backup.yml --python_api…