OpenThaiGPT / openthaigpt-pretraining

Apache License 2.0
21 stars 10 forks source link

feat(data): Split jsonl data #227

Closed kriangkraitan closed 1 year ago

kriangkraitan commented 1 year ago

Why this PR

split jsonl file to use python split_jsonl.py /path/to/file.jsonl --test_size 0.01 --validation_size 0.001

Changes

Related Issues

Close #

Checklist