delphi-suite / delphi

small language models training made easy
Apache License 2.0
9 stars 1 forks source link

llama2 & mamba training configs #113

Closed SrGonao closed 5 months ago

SrGonao commented 5 months ago

(I messed up with the name of this branch. I tried renaming it but it didn't update) I added a default mamba and llama 2 base (they are actually the same) and then the different llama 2 and mamba sizes.

SrGonao commented 5 months ago

@jettjaniak, about inheritance, I'm not being able to have it working. eg. if I have:

"model_config": {
        "model_class": "MambaForCausalLM",
        "vocab_size": 4096,
        "conv_kernel": 4,
        "expand": 2,
        "use_bias": false,
        "use_conv_bias": true,
        "hidden_act": "silu",
        "initializer_range": 0.1,
        "residual_in_fp32": true,
        "tie_word_embeddings": true
    }

in one mamba.json file and:

    "hidden_size": 952,
    "num_hidden_layers": 8
  }

in a 50M.json file and I run python scripts/run_training.py --config_files mamba.json --config_files 50M.json I get KeyError: 'model_class'

Could you verify it @jaidhyani ? I have changed all files to be like that either way

jaidhyani commented 5 months ago

--config_files should be passed exactly once, with space-separated arguments if you want to pass multiple configs

jettjaniak commented 5 months ago

TODOs

jaidhyani commented 5 months ago

TODOs

  • [ ] fix existing tests so they can access configs in new location

AFAIK there's not really a good way to include static files from outside of a package's root directory, by design. Basically all documentation on the subject assumes that static files are included in the project root - e.g. src/ for us. I've just spent some time messing around with ways to get around this with MANIFEST.in or different setup.py options, couldn't get anything working.

jettjaniak commented 5 months ago

can't we just determine the test file path and navigate from there to repo root?

jettjaniak commented 5 months ago

oh, but that won't work if it's installed instead of cloned

jettjaniak commented 5 months ago

gradient_accumulation_step should default to 1 in the code ❗

jettjaniak commented 5 months ago

we need to look at all of these

    "run_name": " ",
    "output_dir": " ",
    "device": "auto",
    "eval_interval": 2000,
    "log_interval": 1,
    "eval_iters": 100,
    "eval_only": false,
    "always_save_checkpoint": false,
    "init_from": "scratch",
    "wandb_config": {
        "log": false,
        "project": " ",
        "entity": " "
    },

and change if needed

jettjaniak commented 5 months ago

all of the model and training config were discussed and updated

jaidhyani commented 5 months ago

116 is a PR on top of this one to migrate static assets to the new top-level non-package path.

jettjaniak commented 5 months ago

we need to find reasonable values for these

    "checkpoint_interval": -1000,
    "extra_checkpoint_iters": [
        -1000,
        -2000
    ],
    "log_interval": -1000,