issues
search
OpenThaiGPT
/
openthaigpt-pretraining
Apache License 2.0
21
stars
10
forks
source link
fix(data): data dedup and decontaminate not working
#237
Closed
new5558
closed
1 year ago
new5558
commented
1 year ago
Why this PR
Why we need this PR?
Changes
Write some changes here
Related Issues
Close #
Checklist
[ ] PR should be in the
Naming convention
[ ] Assign yourself in to Assigneees
[ ] Tag related issues
[ ] Constants name should be ALL_CAPITAL, function name should be snake_case, and class name should be CamelCase
[ ] complex function/algorithm should have
Docstring
[ ] 1 PR should not have more than 200 lines changes (Exception for test files). If more than that please open multiple PRs
[ ] At least PR reviewer must come from the task's team (model, eval, data)
Why this PR
Why we need this PR?
Changes
Related Issues
Close #
Checklist