Lightning-AI / litdata

Transform datasets at scale. Optimize datasets for fast AI model training.
Apache License 2.0
375 stars 43 forks source link

Multithreading function for merge_datasets #413

Closed yhl48 closed 2 weeks ago

yhl48 commented 3 weeks ago
Before submitting - [ ] Was this discussed/agreed via a Github issue? (no need for typos and docs improvements) - [x] Did you read the [contributor guideline](https://github.com/Lightning-AI/lit-data/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests?

What does this PR do?

Adds multithreading to merge_datasets

PR review

Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

codecov[bot] commented 2 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 78%. Comparing base (ac0c89b) to head (7555aaf). Report is 1 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #413 +/- ## =================================== Coverage 78% 78% =================================== Files 34 34 Lines 5052 5056 +4 =================================== + Hits 3960 3964 +4 Misses 1092 1092 ```