OpenThaiGPT / openthaigpt-pretraining

Apache License 2.0
21 stars 10 forks source link

create additional dataset preprocess code [LM-153] #322

Closed ArthurMinovsky closed 9 months ago

ArthurMinovsky commented 10 months ago

Why this PR

In order to added preprocessing code for datasets such as LST , BEST , SCB

Changes

Checklist

linear[bot] commented 10 months ago
LM-153 Aut Dataset Documentation

Update this figma with the follow datasets information

boat1603 commented 10 months ago

@ArthurMinovsky Did you forget to push your commit?

codecov[bot] commented 10 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (3b893c9) 64.16% compared to head (cc9f880) 64.47%. Report is 9 commits behind head on main.

:exclamation: Current head cc9f880 differs from pull request most recent head 5fbee96. Consider uploading reports for the commit 5fbee96 to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #322 +/- ## ========================================== + Coverage 64.16% 64.47% +0.30% ========================================== Files 11 11 Lines 427 425 -2 ========================================== Hits 274 274 + Misses 153 151 -2 ``` | [Flag](https://app.codecov.io/gh/OpenThaiGPT/openthaigpt-pretraining/pull/322/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=OpenThaiGPT) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/OpenThaiGPT/openthaigpt-pretraining/pull/322/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=OpenThaiGPT) | `64.47% <ø> (+0.30%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=OpenThaiGPT#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

ArthurMinovsky commented 10 months ago

@ArthurMinovsky Did you forget to push your commit?

Nah, I already pushed. I think you have to see in file changed again. 🤔