Lightning-AI / litdata

Streamline data pipelines for AI. Process datasets across 1000s of machines, and optimize data for blazing fast model training.
Apache License 2.0
249 stars 24 forks source link

Error: All weights must be positive #157

Open awaelchli opened 3 weeks ago

awaelchli commented 3 weeks ago

šŸ› Bug

To Reproduce

The following usage of LitData's map function leads to an error asking me to provide weights.

from litdata import map

def main():
    metadata = [
        ("sa_000020.tar", "https://scontent.xx.fbcdn.net/m1/v/t6/An_YmP5OIPXun-vu3hkckAZZ2s4lPYoVkiyvCcWiVY21mu1Ng5_1HeCa2CWiSTsskj8HQ8bN013HxNpYDdSC_7jWQq_svcg.tar?ccb=10-5&oh=00_AYD1s3ZScMDDgFoEB0IQMFB3T4WR1hTKUQPwhX_LErycdA&oe=667B8BA8&_nc_sid=0fdd51"),
        ("sa_000021.tar", "https://scontent.xx.fbcdn.net/m1/v/t6/An-V_ojE_rwIRA0Lm6ni3MZPstlaE0JR_HiStyDgfjjjbnkhEigM2QU12FZTwsTRRmE98acikrWLFcMSw0NW6fNJcURx_Kw.tar?ccb=10-5&oh=00_AYAJCM8XdYl407R0oYW-vzRWUeGPkr0y5APjqcwGjnHjTg&oe=667BB81C&_nc_sid=0fdd51"),
    ]

    map(
        fn=download, 
        inputs=metadata, 
        output_dir="data", 
    )

def download(item, output_dir):
    filename, url = item

if __name__ == '__main__':
    main()
Traceback (most recent call last):
  File "/teamspace/studios/this_studio/repro.py", line 22, in <module>
    main()
  File "/teamspace/studios/this_studio/repro.py", line 11, in main
    map(
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/litdata/processing/functions.py", line 252, in map
    return data_processor.run(LambdaDataTransformRecipe(fn, inputs))
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/litdata/processing/data_processor.py", line 931, in run
    workers_user_items = _map_items_to_workers_weighted(
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/litdata/processing/data_processor.py", line 307, in _map_items_to_workers_weighted
    worker_items, worker_weights = _pack_greedily(items=user_items, weights=weights, num_bins=world_size)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/litdata/utilities/packing.py", line 25, in _pack_greedily
    raise ValueError("All weights must be positive.")
ValueError: All weights must be positive.

Expected behavior

I would expect that the weights are inferred automatically by default as [1 / len(inputs)] * len(inputs).

Environment

In Lightning Studio litdata==0.2.8

Additional context

deependujha commented 2 weeks ago

I can't reproduce the error.

Can you share more details on the code?

tchaton commented 2 weeks ago

Same, I can't reproduce with the last version of litdata cc @awaelchli Mind checking ?