arcee-ai / mergekit

Tools for merging pretrained large language models.
GNU Lesser General Public License v3.0
4.63k stars 414 forks source link

Input should be a valid dictionary or instance of MergeConfiguration #418

Open Hugo-Calero opened 3 weeks ago

Hugo-Calero commented 3 weeks ago

Hello!

I am trying to use the following notebook (https://github.com/arcee-ai/mergekit/blob/main/notebook.ipynb) with this yaml configuration file:

''' slices:

merge_method: passthrough dtype: bfloat16 '''

I am getting the following error when running the code. Does merge kit require a specific version of pydantic? The one I have installed is 2.7.1 If anyone experienced anything similar to this or knows how to solve it, any help would be greatly appreciated :)

Error: Traceback (most recent call last): File "/Users/hugocalero/Desktop/PruneMe/slice_with_mergekit/merge_me.py", line 18, in run_merge( File "/Users/hugocalero/Desktop/PruneMe/slice_with_mergekit/mergekit/mergekit/merge.py", line 87, in run_merge ).plan_to_disk(out_path=out_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/hugocalero/Desktop/PruneMe/slice_with_mergekit/mergekit/mergekit/plan.py", line 259, in plan_to_disk self._plan() File "/Users/hugocalero/Desktop/PruneMe/slice_with_mergekit/mergekit/mergekit/plan.py", line 308, in _plan ConfigReader( File "/Users/hugocalero/Desktop/PruneMe/venv/lib/python3.11/site-packages/pydantic/main.py", line 176, in init self.pydantic_validator.validate_python(data, self_instance=self) pydantic_core.pydantic_core.ValidationError: 1 validation error for ConfigReader config Input should be a valid dictionary or instance of MergeConfiguration [type=model_type, input_value=MergeConfiguration(merge...te=None, out_dtype=None), input_type=MergeConfiguration] For further information visit https://errors.pydantic.dev/2.7/v/model_type

metric-space commented 2 weeks ago

Hey @Hugo-Calero The following seems to work for me

slices:
  - sources:
    - model: HuggingFaceTB/SmolLM-360M-Instruct
      layer_range: [0, 2]
    - model: HuggingFaceTB/SmolLM-360M-Instruct
      layer_range: [11,32]

merge_method: passthrough
dtype: bfloat16

Is it possible the space between slices and sources in your above given config file is the problem?

Hugo-Calero commented 2 weeks ago

Hello,

Thank you for your answer. I have tried deleting such space, but I am still getting the same error saying the input is not a valid dict or instance of MergeConfiguration.

The merge_config variable that is loaded is: merge_method='passthrough' slices=[OutputSliceDefinition(sources=[InputSliceDefinition(model=ModelReference(model=ModelPath(path='HuggingFaceTB/SmolLM-360M-Instruct', revision=None), lora=None, override_architecture=None), layer_range=(0, 2), parameters=None), InputSliceDefinition(model=ModelReference(model=ModelPath(path='HuggingFaceTB/SmolLM-360M-Instruct', revision=None), lora=None, override_architecture=None), layer_range=(11, 32), parameters=None)], base_model=None, residual_weight=None, parameters=None)] models=None parameters=None base_model=None dtype='bfloat16' tokenizer_source=None tokenizer=None chat_template=None out_dtype=None