issues
search
Lightning-AI
/
litdata
Streamline data pipelines for AI. Process datasets across 1000s of machines, and optimize data for blazing fast model training.
Apache License 2.0
250
stars
24
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Feat: checkpoint optimize function to restart after crash
#206
deependujha
opened
3 hours ago
1
[pre-commit.ci] pre-commit suggestions
#205
pre-commit-ci[bot]
opened
1 day ago
0
Bump Lightning-AI/utilities from 0.11.3.post0 to 0.11.3
#204
dependabot[bot]
closed
1 day ago
1
Bump pytest from 8.2.1 to 8.2.2
#203
dependabot[bot]
closed
1 day ago
0
Bump lightning-cloud from 0.5.69 to 0.5.70
#202
dependabot[bot]
closed
1 day ago
0
Update numpy requirement from <2.0.0 to <3.0.0
#201
dependabot[bot]
opened
1 day ago
0
Bump coverage from 7.5.3 to 7.5.4
#200
dependabot[bot]
closed
1 day ago
0
Using a streaming dataloader with an unbalanced dataset yields unexpected batch sizes.
#199
esivonxay-cognitiv
opened
3 days ago
4
Fix: dataloader state dict indexerror
#198
esivonxay-cognitiv
closed
3 days ago
1
Add support for encrypting / decrypting the chunks + API Keys
#197
tchaton
opened
4 days ago
0
Index Error when calling StreamingDataLoader.state_dict() when using custom collate_fn with multiple workers
#196
esivonxay-cognitiv
closed
3 days ago
3
Add support for Mosaic Streaming WDS data format
#195
tchaton
opened
5 days ago
8
Release LitData 0.2.14
#194
tchaton
closed
5 days ago
0
When providing a local path to the optimize method, make it work in a distributed settings for Jobs
#193
tchaton
opened
5 days ago
0
Fix: unexpected behaviours (bugs) in train_test_split fixed
#192
deependujha
closed
5 days ago
3
Add support for parquet files for storing the chunks
#191
tchaton
opened
5 days ago
0
Add utility to merge datasets together
#190
tchaton
closed
5 days ago
0
Bump version 0.2.13
#189
tchaton
closed
6 days ago
0
Fix: Resolve the default weights of the combined dataset
#188
tchaton
closed
6 days ago
0
Fix: error while splitting dataset with `splits=[0.1, 0.2, 0.7]` and support split of 0.0
#187
deependujha
closed
6 days ago
1
train_test_split fails when asked for `splits=[0.1, 0.2, 0.7]`
#186
deependujha
closed
6 days ago
1
Dataset weightage bug
#185
yhl48
closed
6 days ago
2
Feat: Append data to pre-optimize dataset
#184
deependujha
closed
6 days ago
1
LitData doesn't support s3 bucket connection outside server
#183
sanyalsunny111
opened
1 week ago
11
train_test_split doesn't support split of 0.0
#182
robmarkcole
closed
6 days ago
4
Using fsspec to download files
#181
samsja
opened
1 week ago
3
Feat: Append data to pre-optimize dataset
#180
deependujha
closed
1 week ago
2
Batch size beginning to vary half way through epoch
#179
MarcoForte
opened
1 week ago
6
Magic FileSerializer is causing issues
#178
tchaton
opened
1 week ago
3
Bump pypa/gh-action-pypi-publish from 1.8.14 to 1.9.0
#177
dependabot[bot]
closed
6 days ago
0
Release version 0.2.12
#176
tchaton
closed
2 weeks ago
0
AttributeError: `np.sctypes` was removed in the NumPy 2.0 release.
#175
bhimrazy
opened
2 weeks ago
8
Update README with config for MinIO
#174
bhimrazy
closed
2 weeks ago
1
Resolve num_workers when the user provides 0
#173
tchaton
closed
2 weeks ago
0
Warning Message When Using StreamingDataset with DDP
#172
taemincho
opened
2 weeks ago
2
Remove crt
#171
tchaton
closed
2 weeks ago
0
Add Example to support with MinIO Configuration
#170
bhimrazy
closed
2 weeks ago
1
fix typo : aws cli url in readme
#169
bhimrazy
closed
2 weeks ago
1
Typo: AWS CLI url points to s3 url
#168
bhimrazy
closed
2 weeks ago
1
Bump version 0.2.10
#167
tchaton
closed
2 weeks ago
0
Fix litdata on colab
#166
tchaton
closed
2 weeks ago
0
Bump version 0.2.9
#165
tchaton
closed
2 weeks ago
0
(fix) CombinedDataset with more than 2 streaming datasets
#164
tchaton
closed
2 weeks ago
0
Add support for custom collate with the StreamingDataLoader
#163
tchaton
closed
3 weeks ago
0
Remove DataLoader example in README
#162
tchaton
closed
3 weeks ago
0
Add feature to slice, subsample and split dataset
#161
deependujha
closed
2 weeks ago
2
Add first draft for multi modal model training text & image
#160
rakro101
closed
2 weeks ago
3
LitData leaves a `status.json` in current working dir
#159
awaelchli
opened
3 weeks ago
2
ValueError: The provided None isn't supported.
#158
awaelchli
opened
3 weeks ago
1
Error: All weights must be positive
#157
awaelchli
opened
3 weeks ago
2
Next