Closed NeonBohdan closed 3 months ago
Already fixed with https://github.com/arcee-ai/mergekit/pull/186
With this PR https://github.com/arcee-ai/mergekit/pull/186 bfloat16: 7%->7% float32: 1e-6% -> 1e-7%
Imprived situation for float32
but not a bit for bfloat16
It's still very high, for 1 model "merges" 7% will be just droped
Can be optimized https://github.com/arcee-ai/mergekit/blob/9a541798231dc4c1e088caf271b04474685e4dcb/mergekit/merge_methods/generalized_task_arithmetic.py#L1
It's always possible to use dare_linear
or task_arithmetic
, but still can math accuracy of it been improved?
I suspect what's happening here is that some of the values in the task vector are exactly zero. The sign
function returns 0 for an input of 0. This shouldn't be a problem - the end result is the same either way, as we'd just be multiplying 0 by +-1 instead of 0 by 0.
This would explain the difference between bfloat16 and float32 as well - the lower precision makes it much more likely.
I desided to debug next merge yaml
Clearly it skips
sparsify
and scales down task vector with 0.75 weightBut it also does
sign consensus
. On only one task vector it should beall True
simplyInstead some prescentage of them are False bfloat16: 7% float32: 1e-6%
Even float32 can't manage to do
all True
Can this code that is responsible for this been updated to have higher accuracy https://github.com/arcee-ai/mergekit/blob/9a541798231dc4c1e088caf271b04474685e4dcb/mergekit/merge_methods/generalized_task_arithmetic.py#L196