issues
search
Lightning-AI
/
litdata
Streamline data pipelines for AI. Process datasets across 1000s of machines, and optimize data for blazing fast model training.
Apache License 2.0
249
stars
24
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Bump coverage from 7.5.0 to 7.5.3
#149
dependabot[bot]
closed
2 weeks ago
1
Bump pytest from 8.2.0 to 8.2.1
#148
dependabot[bot]
closed
4 weeks ago
0
Fix: Resolve drop_last not passed down from the StreamingDataLoader to the datasets
#147
tchaton
closed
1 month ago
0
Performance improvement for processing
#146
sritterginkgo
closed
1 month ago
6
Support splitting datasets
#145
robmarkcole
closed
2 weeks ago
8
NameError: name 'V1DatasetType' is not defined
#144
robmarkcole
closed
1 month ago
1
Update README.md
#143
tchaton
closed
1 month ago
0
Bump LitData version 0.2.7
#142
tchaton
closed
1 month ago
0
Ram increasing during first epoch of training
#141
rakro101
opened
1 month ago
4
Data shard delation with multi GPU does not work
#140
rakro101
opened
1 month ago
4
Add support for exact iteration
#139
tchaton
closed
1 month ago
0
Training slowed down as time progress with litdata streaming dataset
#138
ouj
opened
1 month ago
3
Make `optimize` continue from last checkpoint after crash
#137
cgebbe
opened
1 month ago
3
Prevent race deletion
#136
tchaton
closed
1 month ago
0
Subsample StreamingDataset
#135
yhl48
closed
2 weeks ago
2
Adding breakpoint in `random_images` function crashes pdb
#134
cgebbe
opened
1 month ago
4
StreamingDataset incompatibility with PyTorch Lightning
#133
enrico-stauss
closed
1 month ago
11
Bump JamesIves/github-pages-deploy-action from 4.5.0 to 4.6.1
#132
dependabot[bot]
closed
1 month ago
0
Fix empty tensor deserialization
#131
enrico-stauss
closed
1 month ago
1
DataChunkRecipe is not working when used in litgpt's TinyLlama pretraining example
#130
wen020
opened
1 month ago
5
Pytorch lighting Fabric + lit data + DDP hangs when finishing epoch
#129
miguelalba96
opened
1 month ago
3
Stream selected channels
#128
robmarkcole
opened
1 month ago
0
Fix infinite sleep when loading local compressed dataset.
#127
wzf03
closed
1 month ago
15
Cache directory resolution issues in Google Colab
#126
awaelchli
closed
2 weeks ago
6
Fix configuration of a custom serializers for one of the predefined types
#125
enrico-stauss
closed
1 month ago
6
Fix the NoHeaderTensorSerializer for 1D tensors (other than tokens)
#124
enrico-stauss
closed
1 month ago
6
Update version 0.2.6
#123
tchaton
closed
1 month ago
0
Add support for `iterate_over_all` for the CombinedDataset
#122
tchaton
closed
1 month ago
0
Resolve some bugs
#121
tchaton
closed
1 month ago
0
Optimizing dictionary data structures fails when using a partially initialized function
#120
enrico-stauss
opened
1 month ago
2
Time per sample grows as processed samples grows
#119
scritter
closed
1 month ago
23
Slow Dataset Preprocessing due to CPU affinity (?) issues
#118
mgolub2
opened
1 month ago
5
Bump coverage from 7.4.4 to 7.5.0
#117
dependabot[bot]
closed
2 months ago
0
Bump pytest-cov from 4.1.0 to 5.0.0
#116
dependabot[bot]
closed
2 months ago
2
Bump pytest from 8.0.2 to 8.2.0
#115
dependabot[bot]
closed
2 months ago
0
Update sphinx requirement from <7.0,>=6.0 to >=6.0,<8.0
#114
dependabot[bot]
closed
2 months ago
2
Partials can return tensors
#113
karibbov
closed
1 month ago
2
Progress bar missing with `litdata.StreamingDataset` and wrong number of steps in an epoch
#112
yhl48
closed
1 month ago
4
Bump version 0.2.5
#111
tchaton
closed
2 months ago
0
Remove condition on torch installation
#110
tchaton
closed
2 months ago
0
Move to version 0.2.4
#109
tchaton
closed
2 months ago
0
Streamingdataset torch compatibility
#108
yhl48
closed
2 months ago
1
StreamingDataset support for older PyTorch versions
#107
gn-yh-lim
closed
2 months ago
1
Bump JamesIves/github-pages-deploy-action from 4.5.0 to 4.6.0
#106
dependabot[bot]
closed
1 month ago
1
optimize function on multiple machine writing to local pathes
#105
rakro101
opened
2 months ago
0
Please add s3 path support to optimize (read and write to s3)
#104
rakro101
closed
1 month ago
5
Dataloading is not working when used in litgpt's debug pretraining example
#103
iloshchilov
opened
2 months ago
4
ValueError: buffer size must be a multiple of element size
#102
awaelchli
opened
2 months ago
0
Question: is there a plan to support streaming from GCS?
#101
dnnspark
closed
3 weeks ago
7
Fix `map()` failing to create dataset when `input_dir` is None
#100
awaelchli
closed
2 months ago
0
Previous
Next