google-research / dex-lang

Research language for array processing in the Haskell/ML family
BSD 3-Clause "New" or "Revised" License
1.58k stars 106 forks source link

Create a Dex datasets library #458

Open dan-zheng opened 3 years ago

dan-zheng commented 3 years ago

Motivation

Create a structured datasets library within Dex: lib/datasets.dx.

The library should enable straightforward usage of machine learning datasets, including the following:

Implementation ideas

Prior work

oxinabox commented 3 years ago

Prior Work:

apaszke commented 3 years ago

Other prior work: torchvision. Still, I would say that this is somewhat low priority, because I don't expect we'll be able to make a big splash in the hyper-optimized space of standard ML models.

dan-zheng commented 3 years ago

That makes sense, thanks!

srush commented 3 years ago

@dan-zheng I think a nice option here would be to write bindings to https://en.wikipedia.org/wiki/Apache_Arrow

https://github.com/huggingface/datasets has a ton of datasets in this form. It seems a bit crazy to rewrite this sort of infrastructure for each language.