Modalities / modalities

A framework for training multimodal foundation models.
MIT License
39 stars 3 forks source link

Unused attributes in a component's config yaml are just ignored by Pydantic #97

Closed le1nux closed 1 month ago

le1nux commented 4 months ago

We should make sure that in the case of additional, unused attributes in the config yaml an exception is being raised.

E..g,

train_dataset:  
  component_key: dataset
  variant_key: packed_mem_map_dataset_continuous
  config:
    raw_data_path: /workspaces/modalities/data/redpajama_v2/mem_map/redpajama_v2_samples_512_test.pbin
    index_path: /workspaces/modalities/data/redpajama_v2/mem_map/redpajama_v2_samples_512_test.idx
    block_size: ${settings.training.sequence_length}
    jq_pattern: ".text"
    sample_key:  ${settings.referencing_keys.sample_key}
    tokenizer:
      instance_key: tokenizer
      pass_type: BY_REFERENCE

should throw an exception as the only attributes needed are

train_dataset:  
  component_key: dataset
  variant_key: packed_mem_map_dataset_continuous
  config:
    raw_data_path: /workspaces/modalities/data/redpajama_v2/mem_map/redpajama_v2_samples_512_test.pbin
    index_path: /workspaces/modalities/data/redpajama_v2/mem_map/redpajama_v2_samples_512_test.idx
    sample_key:  ${settings.referencing_keys.sample_key}
le1nux commented 2 months ago

@lllAlexanderlll could you look into this, please? :-)