Closed le1nux closed 1 month ago
We should make sure that in the case of additional, unused attributes in the config yaml an exception is being raised.
E..g,
train_dataset: component_key: dataset variant_key: packed_mem_map_dataset_continuous config: raw_data_path: /workspaces/modalities/data/redpajama_v2/mem_map/redpajama_v2_samples_512_test.pbin index_path: /workspaces/modalities/data/redpajama_v2/mem_map/redpajama_v2_samples_512_test.idx block_size: ${settings.training.sequence_length} jq_pattern: ".text" sample_key: ${settings.referencing_keys.sample_key} tokenizer: instance_key: tokenizer pass_type: BY_REFERENCE
should throw an exception as the only attributes needed are
train_dataset: component_key: dataset variant_key: packed_mem_map_dataset_continuous config: raw_data_path: /workspaces/modalities/data/redpajama_v2/mem_map/redpajama_v2_samples_512_test.pbin index_path: /workspaces/modalities/data/redpajama_v2/mem_map/redpajama_v2_samples_512_test.idx sample_key: ${settings.referencing_keys.sample_key}
@lllAlexanderlll could you look into this, please? :-)
We should make sure that in the case of additional, unused attributes in the config yaml an exception is being raised.
E..g,
should throw an exception as the only attributes needed are