Question about the paper and implementation.

SEONHOK commented 11 months ago

Hi! I have a question about the Algorithm 1 in your paper.

Does algorithm 1 say that you start from the r=1 rank and increase the rank while increasing ξ? It looks like the algorithm1 says that it increases the model over the iteration.
Also, I cannot find the implementation for algorithm 1. Could you point out the location?

Many thanks in advance!

telxt commented 11 months ago

Hello, thank you for your interest in our work!

Algorithm 1 describes the process where the model transitions from a dense state to becoming progressively sparse. A larger ξ corresponds to a sparser model after training. Therefore, r starts as a relatively large value (usually 8 or 16), while increasing ξ causes the rank to decrease. I suspect you might have interpreted $\mathcal{M}'\textit{.add}(\mathcal{M})$ as increasing the model over the iteration. What $\mathcal{M}'\textit{.add}(\mathcal{M})$ actually means is adding the new checkpoint M to M'. Here, M' represents a set of checkpoints. We apologize for the lack of precision in our expression, causing confusion.
The implementation for Algorithm 1 can be found in run_glue.py. You can run the run_glue_sora_schedule_dense.sh file to implement it. We have updated the README and explicitly mentioned this point.

I hope my response helps you.

SEONHOK commented 11 months ago

Thanks a lot for your kind response! It really helps!

TsinghuaC3I / SoRA