Closed tmabraham closed 2 years ago
I need to work on this, I want to add webdataset too and some special-cased datasets like MNIST and CIFAR-10, and I'm not sure how to support this level of flexibility yet...
What do you want me to do with this PR? Shall I close it and work on a more general dataset loader? We can brainstorm further on Discord.
I think I'm ready to look at HF Datasets again now that I have added a "dataset" section with a "type" key to the model config files. :)
Adding support for HuggingFace Datasets.
I add the
--hf-datasets
flag which indicates that the string passed to--train-set
is a HuggingFace Dataset. I assume that there is atrain
split (as is most often the case), and the key for the images in the dataset is provided by--hf-datasets-key
.Hopefully this is fine, but let me know if you want me change this interface in any way...