FoundationVision / LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
https://arxiv.org/abs/2406.06525
MIT License
1.19k stars 46 forks source link

Difficulty in reproducing results with pre-trained weights #41

Open Rishit-dagli opened 2 months ago

Rishit-dagli commented 2 months ago

I was trying to run: https://github.com/FoundationVision/LlamaGen/blob/main/autoregressive/sample/sample_t2i.py, and I tried doing so across different seeds and also tried playing around with the parameters. I have been unsuccessful in reproducing similar looking outputs with 512 x 512 model which produces these outputs:

image

for these prompts

"A portrait photo of a kangaroo wearing an orange hoodie and blue sunglasses standing on the grassin front of the Sydney Opera House holding a sign on the chest that says Welcome Friends!",
"A blue Porsche 356 parked in front of a yellow brick wall.",
"A photo of an astronaut riding a horse in the forest. There is a river in front of them with water lilies.",
"A map of the United States made out of sushi. It is on a table next to a glass of red wine."

I was wondering if you had any tips on reproducing inference results?

PeizeSun commented 2 months ago

Hi~ Can you show me your running command line?