OpenThaiGPT / openthaigpt-pretraining

Apache License 2.0
21 stars 10 forks source link

feat(data): Saving huggingface dataset #221

Closed kriangkraitan closed 1 year ago

kriangkraitan commented 1 year ago

Why this PR

to concatenation all jsonl file to huggingface dataset format

to use this code python your_script.py --train_path /train/path/*.jsonl --eval_path /eval/path/*.jsonl --output_path /output/path

Changes

Related Issues

Close #

Checklist