Lightning-AI / litdata

Transform datasets at scale. Optimize datasets for fast AI model training.
Apache License 2.0
374 stars 42 forks source link

correct the chunk size by adding header size #394

Closed dangthatsright closed 1 month ago

dangthatsright commented 1 month ago

What does this PR do?

Fixes #393

PR review

Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

thanks @tchaton for the help!

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 72%. Comparing base (b9aa903) to head (af5db07). Report is 1 commits behind head on main.

:exclamation: There is a different number of reports uploaded between BASE (b9aa903) and HEAD (af5db07). Click for more details.

HEAD has 12 uploads less than BASE | Flag | BASE (b9aa903) | HEAD (af5db07) | |------|------|------| |unittests|6|2| |3.10|3|1| |3.9|3|1| |ubuntu-22.04|2|0| |macos-13|2|0|
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #394 +/- ## ==================================== - Coverage 78% 72% -7% ==================================== Files 34 34 Lines 5037 5039 +2 ==================================== - Hits 3941 3607 -334 - Misses 1096 1432 +336 ```
bhimrazy commented 1 month ago

Closing this PR as issue #393 has been resolved by #395.

Thank you @dangthatsright for taking the time to raise this issue and work on it. We encourage you to test it with the latest updates from the main branch, and we welcome any feedback on how it goes.