Closed deependujha closed 2 weeks ago
Attention: Patch coverage is 94.09091%
with 13 lines
in your changes missing coverage. Please review.
Please upload report for BASE (
main@b51b597
). Learn more about missing BASE report.
If subsample is 1 (default or passed), expensive tabulation (optimized in its own way) isn't called. Also, using a local random seed sampler to avoid changing the seed of the user.
The only remaining one is: wrong chunk size with dim
And, test_s3_streaming_dataset
test passes in Lightning Studio, but fails in CI.
cc: @tchaton
Before submitting
- [x] Was this discussed/agreed via a Github issue? (no need for typos and docs improvements) - [x] Did you read the [contributor guideline](https://github.com/Lightning-AI/lit-data/blob/main/.github/CONTRIBUTING.md), Pull Request section? - [x] Did you make sure to update the docs? - [x] Did you write any new necessary tests?What does this PR do?
Fixes #135 & fixes #145.
Adds support to slice StreamingDataset![Screenshot from 2024-06-05 13-20-06](https://github.com/Lightning-AI/litdata/assets/76887609/3bf1f180-c601-498a-a4b1-50e8e7512a7e)
Adds support to Subsample StreamingDataset![Screenshot from 2024-06-08 01-29-54](https://github.com/Lightning-AI/litdata/assets/76887609/a1e7cf39-c50a-47a9-aeec-5ef72347560f)
Adds support to train_test_split StreamingDataset![Screenshot from 2024-06-08 01-29-18](https://github.com/Lightning-AI/litdata/assets/76887609/3fc3fea3-252a-4152-98a8-12a9dada91fc)
PR review
Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃