Closed boss-chanon closed 10 months ago
**Rationale**: Using Official Of the Council State (สำนักงานคณะกรรมการกฤษฎีกา) as a part of the Law dataset to expand our pre-trained model knowledge based. **Step by Step** 1. Download data from this website: [พระราชบัญญัติ](https://www.krisdika.go.th/web/guest/law?p_p_id=LawPortlet_INSTANCE_aAN7C2U5hENi&p_p_state=normal&p_p_mode=view&\_LawPortlet_INSTANCE_aAN7C2U5hENi_javax.portlet.action=selectLawTypeMenu&\_LawPortlet_INSTANCE_aAN7C2U5hENi_lawTypeId=2&p_auth=Fxeer5Zp&p_p_lifecycle=0) 2. Select "ตามหัวเรื่อง" and Download every topic/sub-topic as we can 3. Scrape all document that has been available on the website 4. Extract text in PDF file 5. Convert data into JSONL format [image.png](https://uploads.linear.app/03a3f0b5-8e51-4d0f-918c-59e891b8184f/fddb5be0-0169-44b8-ab8b-e88666fd8777/475103ae-d36d-4170-9b68-44d87fe1ec01) 6. Pull request to our GitHub repository Reviewers kwankoravich
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
3b893c9
) 64.16% compared to head (cf01588
) 64.16%. Report is 9 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This PR can't be used because the UI was changed.
Why this PR
Scrape king data from Krisdika. This PR continue from #334
Changes
Related Issues
Close #
Checklist