deepglugs / dalle

8 stars 4 forks source link

training tips #5

Closed skywo1f closed 3 years ago

skywo1f commented 3 years ago

Ok so I believe I've "successfully" completed the training process. Unfortunately my generated images are messy canvasses of color. Do you have any tips on how to improve my training? Right now I am using the 70k image stack scraped from wikipedia and a curated vocab of the 500 most common words. I am training vqvae for the default 2 epochs and dalle for the default 2 epochs. Should I train for more epochs? should I trim my data set? do I need more data? any other tips?

deepglugs commented 3 years ago

codebook dims and dalle depth are important. I've trained my danbooru figures dataset for 5-6 epochs with 20 depth and am bart starting to see eyes in the results. 2 epochs for dalle is not enough, imo.

On Mon, Feb 15, 2021, 04:41 skywo1f notifications@github.com wrote:

Ok so I believe I've "successfully" completed the training process. Unfortunately my generated images are messy canvasses of color. Do you have any tips on how to improve my training? Right now I am using the 70k image stack scraped from wikipedia and a curated vocab of the 500 most common words. I am training vqvae for the default 2 epochs and dalle for the default 2 epochs. Should I train for more epochs? should I trim my data set? do I need more data? any other tips?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/deepglugs/dalle/issues/5, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQZQVCD4HQC27HPYZWPXVB3S7EJA7ANCNFSM4XUT566Q .

skywo1f commented 3 years ago

it looks like the default depth is 48 (which ive been using): parser.add_argument('--depth', default=48, type=int)

but yeah, I will probably have to trim my dataset. Otherwise 6 epochs will take about a week haha.

deepglugs commented 3 years ago

Also make sure the VAE samples look good. Train until they do. If they don't, you'll never get good results from dalle.

On Mon, Feb 15, 2021 at 11:33 AM skywo1f notifications@github.com wrote:

it looks like the default depth is 48 (which ive been using): parser.add_argument('--depth', default=48, type=int)

but yeah, I will probably have to trim my dataset. Otherwise 6 epochs will take about a week haha.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/deepglugs/dalle/issues/5#issuecomment-779413660, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQZQVCC3SWAXFT5XYXHZE6LS7FZI7ANCNFSM4XUT566Q .

skywo1f commented 3 years ago

ok so I

  1. narrowed my list down to 76 vocab words
  2. I used 26k images for those words
  3. trained vae for 10 epochs down to a loss of 0.000563
  4. the vae samples looked ok? just looked like regular images
  5. then I used that to train dalle for 10 epochs. This took ~5 days.
  6. got it down to a loss of 0.108623

Still, when I ask it for a simple image (blue, flower) it gives me some colorless abstract art . I get similar nonsense when I ask it to generate other samples. my_image

Am I

  1. missing some big step
  2. needing a better training set
  3. just bad at training in general
skywo1f commented 3 years ago

this is a sample from the last epoch of vae: vae_10_1600

here is a sample from the last epoch of dalle: dalle_10_1600

deepglugs commented 3 years ago

The samples look good. What is your codebook size? Looks like the paper used 1024 size. 26k images isn't a lot compared to the what the paper trained on. Try other vocab combinations?

On Mon, Mar 15, 2021, 05:09 skywo1f @.***> wrote:

this is a sample from the last epoch of vae: [image: vae_10_1600] https://user-images.githubusercontent.com/4000822/111151255-97ce4180-8565-11eb-99c1-6123f6a8badd.png

here is a sample from the last epoch of dalle: [image: dalle_10_1600] https://user-images.githubusercontent.com/4000822/111151314-a9174e00-8565-11eb-8cbd-0de8095e4c3e.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/deepglugs/dalle/issues/5#issuecomment-799368659, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQZQVCCY5NTTIYCVHG73XFLTDX2GNANCNFSM4XUT566Q .

skywo1f commented 3 years ago

My codebook size is the default 1024. I had 466 samples of flowers and 437 samples of blue. It didnt even come close to drawing either. I am not sure how changing the vocab will change my results.

deepglugs commented 3 years ago

I mean trying "red flower" or "flower blue", see if that changes things. Also, did you use a clip model? That will help guide he results.

On Tue, Mar 16, 2021, 11:18 skywo1f @.***> wrote:

My codebook size is the default 1024. I had 466 samples of flowers and 437 samples of blue. It didnt even come close to drawing either. I am not sure how changing the vocab will change my results.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/deepglugs/dalle/issues/5#issuecomment-800497527, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQZQVCG7RGOUM3TPCZVNRSDTD6OGBANCNFSM4XUT566Q .