Lightning-AI / litdata

Transform datasets at scale. Optimize datasets for fast AI model training.
Apache License 2.0
374 stars 42 forks source link

Feature: Add support for numpy datatypes in TokensLoader #400

Closed bhimrazy closed 2 weeks ago

bhimrazy commented 4 weeks ago

🚀 Feature: Add support for NumPy datatypes in TokensLoader

Currently, it only supports pytorch data types https://github.com/Lightning-AI/litdata/blob/62907b301c382d6c1625a10d4d693e06ad33d259/src/litdata/streaming/item_loader.py#L276-L286

Motivation

Pitch

Alternatives

Additional context