MegEngine / MegDiffusion

MegEngine implementation of Diffusion Models.
Apache License 2.0
16 stars 0 forks source link

eps for GroupNorm #5

Closed Asthestarsfalll closed 2 years ago

Asthestarsfalll commented 2 years ago

Great work! The paramter 'eps' in group norm will be initialized to 1e-5 by default. However, the group norm in TensorFlow has a little diference, which is initialized with 1e-6. Maybe it doesn't have any influence on training results, but can you just modify this(for all GroupNorm in code) for aligning? Because I want to convert the trained model from torch or tf to megengine, the less the error is, the better it is.

ChaiByte commented 2 years ago

Thanks for your watching! The DDPM model was based on some Pytorch code implementation at first and I'm glad to hear that you are willing to convert original pre-trained model to MegEngine. Here are some information might be helpful:

In my opinion, converting scripts are also important for users to understand how converted pre-trained models come from. So I sugguest you upload them into this repo, which could encourage more users join us.

Btw, I'm not sure yet how to develop this library in the future, I hope it will help more people understand the implementation of diffusion models. (OpenAI's improved/guided codebase is great, but lack of readability.)

ChaiByte commented 2 years ago

During developing this repo, I write some notes in Chinese for myself to understand more about diffusion models. Here is a post: https://meg.chai.ac.cn/ddpm-megengine/ Welcome to read it and give me some advice.

Asthestarsfalll commented 2 years ago

I'm willing to upload my convert codes, but it doesn't work well after converting. The error between megengine and pytorch implementation are high with the same input. Because of the padding of convolution in Downsample are different, which in pytorch implementation it uses asymmetric padding. Atfter I modified the megengine implmetation, the result:

class DownSample(M.Module):
    """"A downsampling layer with an optional convolution.

    Args:
        in_ch: channels in the inputs and outputs.
        use_conv: if ``True``, apply convolution to do downsampling; otherwise use pooling.
    """""

    def __init__(self, in_ch, with_conv=True):
        super().__init__()
        self.with_conv = with_conv
        if with_conv:
            self.main = M.Conv2d(in_ch, in_ch, 3, stride=2)
        else:
            self.main = M.AvgPool2d(2, stride=2)

    def _initialize(self):
        for module in self.modules():
            if isinstance(module, M.Conv2d):
                init.xavier_uniform_(module.weight)
                init.zeros_(module.bias)

    def forward(self, x, temb):  # add unused temb param here just for convince
        if self.with_conv:
            x = F.nn.pad(x, [*[(0, 0)
                         for i in range(x.ndim - 2)], (0, 1), (0, 1)])
        return self.main(x)

image

Btw, I'm also a beginner in ddpm, your blog helps me a lot!

ChaiByte commented 2 years ago

Got it. I'm not available at the moment and I will check the padding mode and #6 after day off.

ChaiByte commented 2 years ago

The initial eps value has been updated and I will close this issue now to keep tracking the same thing in one issue.

Feel free to reopen it if you have any questions or suggestion.