issues
search
Lightning-AI
/
litdata
Transform datasets at scale. Optimize datasets for fast AI model training.
Apache License 2.0
374
stars
42
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
fix: Add mechanism to inform the user a new version is available
#420
tchaton
closed
16 hours ago
1
Add support for multi node for Optimize & StreamingDataset
#419
tchaton
opened
3 days ago
0
Add missing interruptible argument for creating jobs
#418
tchaton
closed
4 days ago
1
Litdata optimize is very slow
#417
nightingal3
closed
3 days ago
12
Add OCI as an object storage backend for StreamingData
#416
dkennetzoracle
closed
1 week ago
7
Change S3Client to use user-provided storage_options even in Studio
#415
grez72
closed
1 week ago
7
use storage_options even when IS_IN_STUDIO
#414
grez72
closed
1 week ago
0
Multithreading function for merge_datasets
#413
yhl48
closed
2 weeks ago
1
Fix AttributeError in `BinaryReader` Destructor Due to Non-Existent `_prepare_thread` Attribute
#412
Kidand
closed
2 weeks ago
0
`StreamingDataloader` is not split on each rank when training
#411
Aceticia
closed
2 weeks ago
8
Bump version to 0.2.30
#410
bhimrazy
closed
2 weeks ago
1
Clear Examples of use with different dataset types and code changes.
#409
Woodr7
opened
3 weeks ago
2
training hangs with lightning ddp and cloud dir?
#408
rxqy
opened
3 weeks ago
8
📝 docs: specify custom cache directory
#405
bhimrazy
closed
4 weeks ago
1
Fix broken link for CONTRIBUTING.md
#404
bhimrazy
closed
3 weeks ago
1
`use_checkpoint=True` creates invalid config.json file
#403
cyrildiagne
closed
3 weeks ago
4
incorrect dataloader length when `drop_last=False`
#402
grez72
opened
4 weeks ago
1
Feat/add support for numpy datatypes in tokensloader
#401
bhimrazy
closed
2 weeks ago
1
Feature: Add support for numpy datatypes in TokensLoader
#400
bhimrazy
closed
2 weeks ago
0
Feat: add support for custom cache dir in Streaming Dataset
#399
bhimrazy
closed
4 weeks ago
1
Existing Cache files leads to permanent DataLoader hang
#398
lilavocado
closed
4 weeks ago
5
pass storage options to s5cmd
#397
bhimrazy
closed
1 month ago
2
Combine Small StreamingDatasets into 1 Large StreamingDataset
#396
schopra8
closed
3 weeks ago
5
correct the chunk size by adding header size
#395
tchaton
closed
1 month ago
1
correct the chunk size by adding header size
#394
dangthatsright
closed
1 month ago
2
Writing / Reading Bug involving writer `chunk_bytes` information
#393
dangthatsright
closed
1 month ago
5
Add Support for Custom S3 Configuration in s5cmd
#392
csy1204
closed
1 month ago
2
CONTRIBUTING.md for LitData
#391
deependujha
closed
1 month ago
5
fix: non-deterministic CI test failure
#390
deependujha
closed
1 month ago
1
`One of the worker has failed` error in test
#389
deependujha
closed
1 month ago
1
TreeSpec Error Accessing Data
#388
jmoller93
closed
1 month ago
5
Improve CombinedStreamingDataset to handle multiple subdatasets efficiently
#386
bhimrazy
opened
1 month ago
0
📝 Update Docs: Merge multiple optimized datasets into one
#385
bhimrazy
closed
1 month ago
1
update tags in pkg metadata
#384
Borda
closed
1 month ago
1
Bump version 0.2.29
#383
deependujha
closed
2 months ago
3
Update `PL Data` to `LitData`
#382
bhimrazy
closed
2 months ago
1
Fix/large num chunks error
#381
bhimrazy
closed
1 month ago
3
Revert "Feat: Using fsspec to download files"
#380
tchaton
closed
2 months ago
4
Bump version to 0.2.27
#379
bhimrazy
closed
2 months ago
2
Bump version to 0.2.27.dev
#378
rasbt
closed
2 months ago
2
fix import & asignement issue
#377
Borda
closed
2 months ago
2
improve hint readability
#376
Borda
closed
2 months ago
2
Fix: Chunks deletion issue
#375
deependujha
closed
2 months ago
11
fixing docstrings
#374
Borda
closed
2 months ago
2
reduce unnecessary `pass`
#373
Borda
closed
2 months ago
2
remove not violated bandit rules from ignore
#372
Borda
closed
2 months ago
1
fixing typos in errors & docs
#371
Borda
closed
2 months ago
2
The config isn't consistent between chunks
#370
AugustDev
opened
2 months ago
5
switch `lightning-cloud` to lightning SDK
#369
Borda
closed
2 months ago
3
How can I shut down automatically distributing data when using StreamingDataset?
#368
ygtxr1997
opened
2 months ago
3
Next