CompVis / zigma

A PyTorch implementation of the paper "ZigMa: A DiT-Style Mamba-based Diffusion Model" (ECCV 2024)
https://taohu.me/zigma
Apache License 2.0
271 stars 18 forks source link

Can I input a image that w is not equal to the h? #10

Closed 66ling66 closed 5 months ago

Hiccupwzy commented 5 months ago

I think the image w can be not equal to h for that you can simply resize it into your preferred size. And I have a small question for you. Did you reproduce the sample result follow the faceshq1024_0060000.pt. I followed the instruction, but got corrupt pictures rather than the clean face. Thank you for your help. @66ling66 image

Hiccupwzy commented 5 months ago

I change this line decode and divide by 0.18215. Then the raw picture can turn to raw colorful picture. But the generated picture still be corrupted. image

AY-Liu commented 5 months ago

I change this line decode and divide by 0.18215. Then the raw picture can turn to raw colorful picture. But the generated picture still be corrupted. image

same problem. Do you solve it?

Hiccupwzy commented 5 months ago

Still not. But I retrained the facehq1024 using zigman8 config file. Honestly, the performance of my model is not good as that mentioned in the paper.

dongzhuoyao commented 5 months ago

Hi, can you try the updated code with "faceshq1024_0090000"?

We also provided the Landscape1024 checkpoint too.

dongzhuoyao commented 5 months ago

If the input W and H is not equal. You can try:

1) Padding the images. 2) implement a space-fillling curve yourself, you can get some intuition from here: https://github.com/galtay/hilbertcurve