Difficulty in reproducing results with pre-trained weights

I was trying to run: https://github.com/FoundationVision/LlamaGen/blob/main/autoregressive/sample/sample_t2i.py, and I tried doing so across different seeds and also tried playing around with the parameters. I have been unsuccessful in reproducing similar looking outputs with 512 x 512 model which produces these outputs:

for these prompts

"A portrait photo of a kangaroo wearing an orange hoodie and blue sunglasses standing on the grassin front of the Sydney Opera House holding a sign on the chest that says Welcome Friends!",
"A blue Porsche 356 parked in front of a yellow brick wall.",
"A photo of an astronaut riding a horse in the forest. There is a river in front of them with water lilies.",
"A map of the United States made out of sushi. It is on a table next to a glass of red wine."

I was wondering if you had any tips on reproducing inference results?

FoundationVision / LlamaGen

Difficulty in reproducing results with pre-trained weights #41