arcee-ai / mergekit

Tools for merging pretrained large language models.
GNU Lesser General Public License v3.0
4.81k stars 439 forks source link

EvoMerge Genome Bug #323

Closed Jacobsolawetz closed 5 months ago

Jacobsolawetz commented 6 months ago
  File "/opt/conda/lib/python3.10/site-packages/cma/evolution_strategy.py", line 4392, in fmin2
    res = fmin(objective_function, x0, sigma0,
  File "/opt/conda/lib/python3.10/site-packages/cma/evolution_strategy.py", line 4818, in fmin
    X, fit = es.ask_and_eval(parallel_objective or objective_function,
  File "/opt/conda/lib/python3.10/site-packages/cma/evolution_strategy.py", line 2528, in ask_and_eval
    f = func(x, *args) if kappa == 1 else \
  File "/mergekit/mergekit/scripts/evolve.py", line 264, in parallel_evaluate
    res = strat.evaluate_genotypes(x)
  File "/mergekit/mergekit/evo/strategy.py", line 100, in evaluate_genotypes
    return list(
  File "/opt/conda/lib/python3.10/site-packages/ray/util/actor_pool.py", line 113, in get_generator
    yield self.get_next()
  File "/opt/conda/lib/python3.10/site-packages/ray/util/actor_pool.py", line 309, in get_next
    return ray.get(future)
  File "/opt/conda/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/ray/_private/worker.py", line 2623, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
  File "/opt/conda/lib/python3.10/site-packages/ray/_private/worker.py", line 861, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::InMemoryMergeEvaluator.evaluate_genotype() (pid=11347, ip=172.17.0.2, actor_id=b1e080121faa41fb8c90c12001000000, repr=<mergekit.evo.actors.In
MemoryMergeEvaluator object at 0x7f71d28f27d0>)
  File "/mergekit/mergekit/evo/actors.py", line 303, in evaluate_genotype
    return self.evaluate(genotype)
  File "/mergekit/mergekit/evo/actors.py", line 235, in evaluate
    config = self.genome.genotype_merge_config(genotype)
  File "/mergekit/mergekit/evo/genome.py", line 101, in genotype_merge_config
    (n_layer_groups, n_models, n_params) = genotype.shape
ValueError: not enough values to unpack (expected 3, got 0)
genome:
        models:
                - /pt
                - MaziyarPanahi/Calme-7B-Instruct-v0.2
                - Equall/Saul-Base
        layer_granularity: 8
        base_model: mistralai/Mistral-7B-v0.1
        merge_method: dare_ties
tasks:
        - name: tr_gpt
          weight: 0.005
          metric: chrf,none
        - name: agieval_lsat_ar
          weight: 1.0
        - name: truthfulqa_mc2
          weight: 0.2
        - name: legalbench_insurance_policy_interpretation_multiple_choice
          weight: 0.5
        - name: legalbench_canada_tax_court_outcomes_multiple_choice
          weight: 0.5

This ran for 224/2000 evals before crash

Jacobsolawetz commented 6 months ago

Might be fixed here https://github.com/arcee-ai/mergekit/pull/307?

cg123 commented 5 months ago

307 adds some safety rails to try to prevent this particular crash. As I haven't actually been able to replicate it I'm not 100% sure the merge will be able to continue from that point, but maybe! Leaving this open until we know.