OpenThaiGPT / openthaigpt-pretraining

Apache License 2.0
21 stars 10 forks source link

2g 3g pantip preprocess LM-156 #327

Open ArthurMinovsky opened 10 months ago

ArthurMinovsky commented 10 months ago

Why this PR

Why we need this PR?

Changes

Related Issues

Close #

Checklist

linear[bot] commented 10 months ago
LM-156 Pantip datasets

[image.png](https://uploads.linear.app/03a3f0b5-8e51-4d0f-918c-59e891b8184f/f19890ea-c0d4-440c-baa5-85210806ce2c/4a591f83-5d87-42c4-9cac-6fa9f33ca412) * [ ] Send path of Pantip data: original , preprocessed * [ ] Preprocessing Code