Open syGOAT opened 9 months ago
Thank you! I'm glad people are finding it so useful.
That's exactly correct - density
is the fraction of delta parameters retained. When using DARE with TIES there isn't a second sparsification applied, it simply performs the sign-consensus step from TIES to the sparsified, scaled delta given by DARE. So the density will be the same throughout.
Thanks for your explanation! Maybe I've understood your point.
In the original paper of TIES-Merging: https://arxiv.org/abs/2306.01708, TIES contains 3 steps: Trim, Elect, Disjoint Merge. Trim keeps the top k%. Elect creates a sign vector. Disjoint Merge computes a disjoint mean. In mergekit, dare_ties
means using DARE in place of Trim, with the other steps(Elect, Disjoint Merge) remaining the same.
Congratulations on the significant breakthrough achieved in model merging! I'd like to ask you a question. I use
dare_ties
to merge some models. Here is my yaml file:I believe the 'density' here refers to the delta parameters randomly retained by DARE. What is the density during the TIES stage after using DARE? Is it the same as the density of DARE, or is there a specific method for setting it?