Closed SEONHOK closed 11 months ago
Hello, thank you for your interest in our work!
Algorithm 1 describes the process where the model transitions from a dense state to becoming progressively sparse. A larger ξ corresponds to a sparser model after training. Therefore, r starts as a relatively large value (usually 8 or 16), while increasing ξ causes the rank to decrease. I suspect you might have interpreted $\mathcal{M}'\textit{.add}(\mathcal{M})$ as increasing the model over the iteration. What $\mathcal{M}'\textit{.add}(\mathcal{M})$ actually means is adding the new checkpoint M to M'. Here, M' represents a set of checkpoints. We apologize for the lack of precision in our expression, causing confusion.
The implementation for Algorithm 1 can be found in run_glue.py
. You can run the run_glue_sora_schedule_dense.sh
file to implement it. We have updated the README and explicitly mentioned this point.
I hope my response helps you.
Thanks a lot for your kind response! It really helps!
Hi! I have a question about the Algorithm 1 in your paper.
Does algorithm 1 say that you start from the r=1 rank and increase the rank while increasing ξ? It looks like the algorithm1 says that it increases the model over the iteration.
Also, I cannot find the implementation for algorithm 1. Could you point out the location?
Many thanks in advance!