Open brucethemoose opened 1 year ago
A total anywhere in the 0-1.2 range will almost certainly be fine. You can probably get a lot weirder than that, but I'm still experimenting myself.
The paper this method comes from (https://arxiv.org/abs/2311.03099) shows great results with a drop rate as high as 0.9, which would be a density
value of 0.1. I haven't tried that low yet though. 0.3-0.5 have worked for me so far.
I'd be interested to hear if you get any fun results or run into any trouble with the code.
Well for starters the install for the new branch doesn't quite work? I had to manually add the scripts and merge_methods folders into pip's install directory.
Mergekit doesn't like the Yi tokenizer, but that's fine, I can just use the llama one or copy it over.
Also my first test merge seems to be corrupt, and makes transformers error out with a bunch of strange CUDA asserts. A ties merge from the main branch 5 days ago worked fine. The config was:
models:
- model: /home/alpha/Storage/Models/Raw/larryvrh_Yi-34B-200K-Llamafied
# no parameters necessary for base model
- model: /home/alpha/Storage/Models/Raw/migtissera_Tess-M-v1.2
parameters:
weight: 0.62
density: 0.55
- model: /home/alpha/Storage/Models/Raw/Nous-Capybara-34B
parameters:
weight: 0.56
density: 0.55
merge_method: dare_ties
base_model: /home/alpha/Storage/Models/Raw/larryvrh_Yi-34B-200K-Llamafied
parameters:
int8_mask: true
dtype: bfloat16
...
../aten/src/ATen/native/cuda/Indexing.cu:1237: indexSelectSmallIndex: block: [28,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
File "/home/alpha/AI/text-generation-webui/modules/callbacks.py", line 57, in gentask
ret = self.mfunc(callback=_callback, *args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alpha/AI/text-generation-webui/modules/text_generation.py", line 355, in generate_with_callback
shared.model.generate(**kwargs)
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/transformers/generation/utils.py", line 1719, in generate
return self.sample(
^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/transformers/generation/utils.py", line 2801, in sample
outputs = self(
^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 1034, in forward
outputs = self.model(
^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 879, in forward
inputs_embeds = self.embed_tokens(input_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/sparse.py", line 163, in forward
return F.embedding(
^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/functional.py", line 2237, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered
shrug My testing time is limited, but I will poke at it some more soon, lol.
That looks like a tokenizer mismatch issue to me. Did you maybe copy in the tokenizer for Tess-M-v1.2? The added tokens not present in the base model could cause that particular error.
(You can probably also work with the Yi tokenizer class directly if you pass --trust-remote-code
, if that's your jam.)
I'll see if I can replicate the setup issue too, that sounds annoying.
That looks like a tokenizer mismatch issue to me. Did you maybe copy in the tokenizer for Tess-M-v1.2? The added tokens not present in the base model could cause that particular error.
(You can probably also work with the Yi tokenizer class directly if you pass
--trust-remote-code
, if that's your jam.)I'll see if I can replicate the setup issue too, that sounds annoying.
That is precisely what I did, to the dot. You probably don't have to replicate the model config, lol.
Yeah it works with the base model tokenizer, thanks. In fact, a few responses from the merge model seem pretty smart.
Any positive results from parameter tweaking yet?
Also, is there a particular reason not to go higher density? Should't values above 0.5 "preserve" more of the finetuning from the models?
Are weights that add up to ~1.2 a sane target? And whats a sane value for the Bernoulli density thing?