About Model-Breadcrumbs merge implementation

Hey,

Thanks for your great work, I have a question about the BreadCrumbs sparsification implementation in https://github.com/arcee-ai/mergekit/blob/57e7d14e2a732f532970e2c9dada00e2d8f15a7a/mergekit/sparsify.py#L61-L100

From the Model-Breadcrumbs paper, they seem to be doing the top-beta and bottom-gamma pruning per layer independently within a task vector. However, in the implementation in your toolkit, it seems like the top-beta and bottom-gamma pruning is done globally across all layers within a task vector? Wouldn't this potentially do an incorrect pruning (based on what the paper describes) if the per-layer statistics of the weights are quite different, across layers?

Please correct me if I am misunderstanding something. Thanks

arcee-ai / mergekit

About Model-Breadcrumbs merge implementation #455