Lightning-AI / litdata

Streamline data pipelines for AI. Process datasets across 1000s of machines, and optimize data for blazing fast model training.
Apache License 2.0
249 stars 24 forks source link

ValueError: The provided None isn't supported. #158

Open awaelchli opened 3 weeks ago

awaelchli commented 3 weeks ago

🐛 Bug

A cryptic error message appears in the case below.

To Reproduce

from litdata import map

def main():
    metadata = [
        ("sa_000020.tar", "https://scontent.xx.fbcdn.net/m1/v/t6/An_YmP5OIPXun-vu3hkckAZZ2s4lPYoVkiyvCcWiVY21mu1Ng5_1HeCa2CWiSTsskj8HQ8bN013HxNpYDdSC_7jWQq_svcg.tar?ccb=10-5&oh=00_AYD1s3ZScMDDgFoEB0IQMFB3T4WR1hTKUQPwhX_LErycdA&oe=667B8BA8&_nc_sid=0fdd51"),
        ("sa_000021.tar", "https://scontent.xx.fbcdn.net/m1/v/t6/An-V_ojE_rwIRA0Lm6ni3MZPstlaE0JR_HiStyDgfjjjbnkhEigM2QU12FZTwsTRRmE98acikrWLFcMSw0NW6fNJcURx_Kw.tar?ccb=10-5&oh=00_AYAJCM8XdYl407R0oYW-vzRWUeGPkr0y5APjqcwGjnHjTg&oe=667BB81C&_nc_sid=0fdd51"),
    ]

    map(
        fn=download, 
        inputs=metadata, 
        output_dir="data",
        weights=[1 / len(metadata)] * len(metadata),
    )

def download(item, output_dir):
    filename, url = item

if __name__ == '__main__':
    main()
Traceback (most recent call last):
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/litdata/processing/data_processor.py", line 177, in _download_data_target
    raise ValueError(f"The provided {input_dir.url} isn't supported.")
ValueError: The provided None isn't supported.

Expected behavior

I expect this to at least be a meaningful error. I executed this in a Studio. All I wanted is to run the map function to download dataset files.

Environment

Lightning Studio litdata==0.2.8

tchaton commented 2 weeks ago

@awaelchli Can you try again with main, I can't reproduce it anymore.