OpenThaiGPT / openthaigpt-pretraining

Apache License 2.0
21 stars 10 forks source link

feat(data): clean&reformat pantip 3G dataset #220

Closed kriangkraitan closed 1 year ago

kriangkraitan commented 1 year ago

Why this PR

This is a code to clean pantip 3G datasets to jsonl

to use this code python reformat.py --input_folder <input file directory> --output_folder <output file directory>

Changes

Related Issues

Close #

Checklist