crowsonkb / k-diffusion

Karras et al. (2022) diffusion models for PyTorch
MIT License
2.26k stars 372 forks source link

HuggingFace Datasets support #5

Closed tmabraham closed 2 years ago

tmabraham commented 2 years ago

Adding support for HuggingFace Datasets.

I add the --hf-datasets flag which indicates that the string passed to --train-set is a HuggingFace Dataset. I assume that there is a train split (as is most often the case), and the key for the images in the dataset is provided by --hf-datasets-key.

Hopefully this is fine, but let me know if you want me change this interface in any way...

crowsonkb commented 2 years ago

I need to work on this, I want to add webdataset too and some special-cased datasets like MNIST and CIFAR-10, and I'm not sure how to support this level of flexibility yet...

tmabraham commented 2 years ago

What do you want me to do with this PR? Shall I close it and work on a more general dataset loader? We can brainstorm further on Discord.

crowsonkb commented 2 years ago

I think I'm ready to look at HF Datasets again now that I have added a "dataset" section with a "type" key to the model config files. :)