Open raspiduino opened 1 year ago
@raspiduino Hi, this is simply because there is no publicly available pretrained checkpoint for Imagen (In fact stable-diffusion is the only large pretrained text-to-image model we can access).
Thank you for replying! Can I ask another question?
I saw the option to use CLIP instead of StableDiffusion. I tried it (passing parameter to the main.py
), but the generated 3D image has really low quality, even after I trained with 5000 steps (which should be OK to use with StableDiffusion)
I saw CLIP version works much faster (10 min), but its quality is really bad. So my question is why it did that bad quality, and how to improve it? Thanks!
CLIP guidance is in fact the previous work dreamfields, and its quality is indeed worse. You could find some good examples here: https://github.com/shengyu-meng/dreamfields-3D
I know it might be stupid to open this issue, but I can't find the
Discussion
tab on Github to ask about this :)) Also I'm new to text2image and text23D things :)) Also, while waiting for my model to finish 5000 steps, I write this.The readme reads:
I understand that there is something that makes SD different from Google's Imagen, and that requires extra conversion and therefore extra time needed for each iteration.
So my question is: instead of using SD, can we use [Imagen-pytorch], an open source implementation of Google's Imagen in Pytorch, to generate the image? Will that reduce the training time?
Thank you! And thanks for this wonderful repo!