Kipok / NeMo-Skills

A pipeline to improve skills of large language models
https://kipok.github.io/NeMo-Skills/
Apache License 2.0
185 stars 41 forks source link

shtoshni/code sft data preparation #84

Closed shtoshni closed 3 months ago

shtoshni commented 3 months ago

SFT data preparation scripts

Kipok commented 3 months ago

Looks good! The only thing is can you please add a test similar to this one https://github.com/Kipok/NeMo-Skills/blob/main/tests/test_data_preparation.py#L13 for the new processor?

shtoshni commented 3 months ago

Added test to test_data_preparation