facebookresearch / spdl

Scalable and Performant Data Loading
BSD 2-Clause "Simplified" License
44 stars 2 forks source link

Can we leverage apache arrow as a language-independent internal memory representation? #269

Open npuichigo opened 17 hours ago

mthrok commented 8 hours ago

Hi

I'm not sure what "language-independent" means, but if you are using Arrow to represent a dataset, then you can pass an iterator to source and it just works.

I'm looking into a way to leverage Arrow or Pandas as a representation of dataset for specific use case but SPDL itself needs no change for that.