OpenThaiGPT / openthaigpt-pretraining

Apache License 2.0
21 stars 10 forks source link

Crawl and clean PRD data #140

Closed Chawak closed 1 year ago

Chawak commented 1 year ago

src:https://www.prd.go.th/th/page/item/index/id/1#

nakarin commented 1 year ago

Completed. You can load from HF dataset at https://huggingface.co/datasets/nakcnx/prd_news.

Chawak commented 1 year ago

Is there anything to clean in the dataset krub P'Name ? If there isn't, this issue can be closed.

nakarin commented 1 year ago

Nothing to clean krub.