Open johndpope opened 3 years ago
I did not see that. thanks for pointing that out. Besides my own dataset, I've been training on gwern's anime figures with tags (as you know). https://www.gwern.net/Crops. I'm only training on 27k images but results haven't been great. Dalle is really slow to train and I've only been training when I have spare cycles. Loss keeps going down though, so we'll see.
I have some spare gpu cycles if you can help spoon feed the work / I'll run it on my box.
I've uploaded the models I'm using here: https://mega.nz/folder/XF1CzR4T#E4Mc9IXMZRNLqW4yGkcToA To train the danbooru2019 model, I use the following:
python3 dalle.py --source ~/nas2/ai/datasets/danbooru2019-figures/classifier/ --vocab curated_512.vocab \
--vae vae_dan2019_cb2048_256px.pt --dalle dalle_an2019_cb2048_256px_2.pt \
--train_dalle --batch_size=1 --vgg_loss --samples_out samples/dan2019_figures/dalle \
--vae_layers=3 --size 256 --codebook_dims=2048 --epochs=1 --depth 20
The dalle model is huge. It's uploading now. After that's done I'll upload the dataset. It's a somewhat small part of the larger figures dataset, so you may want to get run rsync and get a larger set. I think I have a script that makes tag/label files from the image set, but I'll have to look around.
Also note that training this large of a dalle model requires 24gb of memory at least. I haven't been able to train anything over depth 20 because of the lack of memory. Apparently depth 20 is still rather tiny and may not be able to converge.
my dataset is here: https://mega.nz/file/WMkiCYIR#Kuw4Etoj_VD1kPrizJo4IpMM5nq5ll-5nzkZMqGghhk
did you see this? 78k image-caption pairs (3.8 GB zipped) https://github.com/lucidrains/DALLE-pytorch/issues/7