Kipok / NeMo-Skills

A pipeline to improve skills of large language models
https://kipok.github.io/NeMo-Skills/
Apache License 2.0
185 stars 41 forks source link

add setting for large-scale data training #223

Closed wedu-nvidia closed 6 days ago

wedu-nvidia commented 6 days ago

add setting for large-scale data training

++model.data.train_ds.hf_dataset=True \
++model.data.train_ds.index_mapping_dir=/data/your_data_cache/