Closed SrGonao closed 5 months ago
@jettjaniak, about inheritance, I'm not being able to have it working. eg. if I have:
"model_config": {
"model_class": "MambaForCausalLM",
"vocab_size": 4096,
"conv_kernel": 4,
"expand": 2,
"use_bias": false,
"use_conv_bias": true,
"hidden_act": "silu",
"initializer_range": 0.1,
"residual_in_fp32": true,
"tie_word_embeddings": true
}
in one mamba.json file and:
"hidden_size": 952,
"num_hidden_layers": 8
}
in a 50M.json file
and I run
python scripts/run_training.py --config_files mamba.json --config_files 50M.json
I get KeyError: 'model_class'
Could you verify it @jaidhyani ? I have changed all files to be like that either way
--config_files should be passed exactly once, with space-separated arguments if you want to pass multiple configs
TODOs
TODOs
- [ ] fix existing tests so they can access configs in new location
AFAIK there's not really a good way to include static files from outside of a package's root directory, by design. Basically all documentation on the subject assumes that static files are included in the project root - e.g. src/
for us. I've just spent some time messing around with ways to get around this with MANIFEST.in or different setup.py options, couldn't get anything working.
can't we just determine the test file path and navigate from there to repo root?
oh, but that won't work if it's installed instead of cloned
gradient_accumulation_step should default to 1 in the code ❗
we need to look at all of these
"run_name": " ",
"output_dir": " ",
"device": "auto",
"eval_interval": 2000,
"log_interval": 1,
"eval_iters": 100,
"eval_only": false,
"always_save_checkpoint": false,
"init_from": "scratch",
"wandb_config": {
"log": false,
"project": " ",
"entity": " "
},
and change if needed
all of the model and training config were discussed and updated
we need to find reasonable values for these
"checkpoint_interval": -1000,
"extra_checkpoint_iters": [
-1000,
-2000
],
"log_interval": -1000,
(I messed up with the name of this branch. I tried renaming it but it didn't update) I added a default mamba and llama 2 base (they are actually the same) and then the different llama 2 and mamba sizes.