Lightning-AI / litdata

Streamline data pipelines for AI. Process datasets across 1000s of machines, and optimize data for blazing fast model training.
Apache License 2.0
249 stars 24 forks source link

Add first draft for multi modal model training text & image #160

Closed rakro101 closed 2 weeks ago

rakro101 commented 3 weeks ago
Before submitting - [X] Was this discussed/agreed via a Github issue? (no need for typos and docs improvements) #140 https://github.com/Lightning-AI/litdata/issues/140 - [ ] Did you read the [contributor guideline](https://github.com/Lightning-AI/lit-data/blob/main/.github/CONTRIBUTING.md), Pull Request section? - [x] Did you make sure to update the docs? - [ ] Did you write any new necessary tests?

What does this PR do?

Add an example for multi modal model training.

Fixes # (issue).

PR review

Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

codecov[bot] commented 3 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Please upload report for BASE (main@4cc7945). Learn more about missing BASE report.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #160 +/- ## ===================================== Coverage ? 77% ===================================== Files ? 30 Lines ? 4151 Branches ? 0 ===================================== Hits ? 3206 Misses ? 945 Partials ? 0 ```
rakro101 commented 2 weeks ago

Did you read the contributor guideline, Pull Request section? this file not exists @tchaton

rakro101 commented 2 weeks ago

Few more comments. Also is there a way we can run this example (in some minimal version) inside our CI?

i added cl_run.py - does this fit for you?