OpenThaiGPT / openthaigpt-pretraining

Apache License 2.0
21 stars 10 forks source link

feat(data): Convert Pile dataset to hf #312

Closed boss-chanon closed 1 year ago

boss-chanon commented 1 year ago

Why this PR

make pipeline for convert pile dataset to hf dataset with our format

Changes

Related Issues

Close #

Checklist

codecov[bot] commented 1 year ago

Codecov Report

All modified lines are covered by tests :white_check_mark:

Comparison is base (361a98f) 19.39% compared to head (b610acc) 94.15%. Report is 1 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #312 +/- ## =========================================== + Coverage 19.39% 94.15% +74.76% =========================================== Files 25 10 -15 Lines 1392 291 -1101 =========================================== + Hits 270 274 +4 + Misses 1122 17 -1105 ``` | [Flag](https://app.codecov.io/gh/OpenThaiGPT/openthaigpt-pretraining/pull/312/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=OpenThaiGPT) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/OpenThaiGPT/openthaigpt-pretraining/pull/312/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=OpenThaiGPT) | `94.15% <ø> (+74.76%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=OpenThaiGPT#carryforward-flags-in-the-pull-request-comment) to find out more. [see 35 files with indirect coverage changes](https://app.codecov.io/gh/OpenThaiGPT/openthaigpt-pretraining/pull/312/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=OpenThaiGPT)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.