Closed deependujha closed 2 weeks ago
Attention: Patch coverage is 20.00000%
with 4 lines
in your changes missing coverage. Please review.
Please upload report for BASE (
main@935eef5
). Learn more about missing BASE report.
Hey @deependujha, how is it going ?
I modified item_loader interval
, and instead of returning a list of [start_chunk_idx, end_chunk_idx]
, it returns [start_chunk_idx, my_chunk_start, my_chunk_end, end_chunk]
.
Refer to this line: streaming/item_loader#192
.
my_chunk_start
and my_chunk_end
denotes from which index to which index of the current chunk, this streaming dataset is allowed to read.
If we are to read the whole chunk, it will simply be [start_chunk_idx, start_chunk_idx, end_chunk_idx, end_chunk_idx]
.
The logic for subsampling:
The logic for train_test_split
Using each chunk as a basis for subsampling and train_test_split.
I recommend you to always start with writing test when doing development new features. This helps to ensure the changes you are making are correct.
Closing in favor of #161
Before submitting
- [ ] Was this discussed/agreed via a Github issue? (no need for typos and docs improvements) - [ ] Did you read the [contributor guideline](https://github.com/Lightning-AI/lit-data/blob/main/.github/CONTRIBUTING.md), Pull Request section? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests?What does this PR do?
Fixes #135 & fixes #145.
Adds support to slice StreamingDataset
Adds support to Subsample StreamingDataset
Adds support to train_test_split StreamingDataset
PR review
Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃