NeonLeexiang / DLGSANet

Apache License 2.0
86 stars 8 forks source link

about arch #3

Closed FlotingDream closed 1 year ago

FlotingDream commented 1 year ago

Hi, I am wondering if you could share the arch/model?

Or I'm trying to implement it myself and have some problems. If you can help me, it will be great.

  1. the dynamic weight part of MHDLSA. from eq. 1, I refer to https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py maybe a pool use first?
  2. in eq. 1, the num of squeeze factor?
  3. in eq. 1, the num of the kernel size?
  4. I refer to FFN in https://github.com/swz30/Restormer/blob/main/basicsr/models/archs/restormer_arch.py. maybe ffn_expansion_factor is 1?
  5. some help for test TLC ?

if you can share a onnx model file with like onnx.save(onnx.shape_inference.infer_shapes(onnx_model), model_file), it will help me to refer to the arch structure in ntron.app.

Anyway, I'm still looking forward to the official arch‘s release.

Thx!

NeonLeexiang commented 1 year ago

😬hiii

  1. There is no pooling operation in MHDLSA since doing so would result in some information loss for SR tasks.
  2. Due to the comparable parameter count with MHSA, the squeeze factor number is 4.
  3. The num of kernel size is 7, compares to the 8 window size of MHSA.
  4. The fnn_expansion_factor is 2, compares to the SwinIR.
  5. The kernel size of TLC is set to 48 (the training patch size) for the x4 upscale factor.

Tips: We reconstruct the https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py into MHDLSA, as indicated in the paper. We are grateful for the excellent work done by the https://github.com/Atten4Vis/DemystifyLocalViT/

🙏: We apologize for not extending our work to the onnx model for mobile devices.

TODO: Soon, we'll make the archs file available with more information about the network architecture.

We admit that our work has advanced significantly as a result of our completion of several details during the network structure design phase. I do, however, hope that it might help you in applying the methods that we proposed.

FlotingDream commented 1 year ago

😬hiii

  1. There is no pooling operation in MHDLSA since doing so would result in some information loss for SR tasks.
  2. Due to the comparable parameter count with MHSA, the squeeze factor number is 4.
  3. The num of kernel size is 7, compares to the 8 window size of MHSA.
  4. The fnn_expansion_factor is 2, compares to the SwinIR.
  5. The kernel size of TLC is set to 48 (the training patch size) for the x4 upscale factor.

Tips: We reconstruct the https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py into MHDLSA, as indicated in the paper. We are grateful for the excellent work done by the https://github.com/Atten4Vis/DemystifyLocalViT/

🙏: We apologize for not extending our work to the onnx model for mobile devices.

TODO: Soon, we'll make the archs file available with more information about the network architecture.

We admit that our work has advanced significantly as a result of our completion of several details during the network structure design phase. I do, however, hope that it might help you in applying the methods that we proposed.

thx your quick replay.

  1. no pool seems the dim do not mach? how to get that work? the Reshape() in eq.1 combine H*W to 1? how? and what is G mean? maybe I just miss this part in the whole implemente. thx seems I miss the line x denotes the pixel index. have a clue thx, so it is idynamic? G head use? and maybe a code example for The detailed network of the dynamic weight generation is shown in Figure 2. Similar to the multi-head selfattention methods [21, 23, 31], we divide the number of feature channels into G heads and learn separate dynamic weights in parallel 2-5. fine

onnx just request for more detiles of arch. if official arches will release soon. onnx is not needed.

thx your good work!

NeonLeexiang commented 1 year ago

😬hiii

  1. There is no pooling operation in MHDLSA since doing so would result in some information loss for SR tasks.
  2. Due to the comparable parameter count with MHSA, the squeeze factor number is 4.
  3. The num of kernel size is 7, compares to the 8 window size of MHSA.
  4. The fnn_expansion_factor is 2, compares to the SwinIR.
  5. The kernel size of TLC is set to 48 (the training patch size) for the x4 upscale factor.

Tips: We reconstruct the https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py into MHDLSA, as indicated in the paper. We are grateful for the excellent work done by the https://github.com/Atten4Vis/DemystifyLocalViT/ 🙏: We apologize for not extending our work to the onnx model for mobile devices.

TODO: Soon, we'll make the archs file available with more information about the network architecture.

We admit that our work has advanced significantly as a result of our completion of several details during the network structure design phase. I do, however, hope that it might help you in applying the methods that we proposed.

thx your quick replay.

  1. ~no pool seems the dim do not mach? how to get that work? the Reshape() in eq.1 combine H*W to 1? how? and what is G mean? maybe I just miss this part in the whole implemente. thx ~ seems I miss the line x denotes the pixel index. have a clue thx, so it is idynamic? G head use? and maybe a code example for The detailed network of the dynamic weight generation is shown in Figure 2. Similar to the multi-head selfattention methods [21, 23, 31], we divide the number of feature channels into G heads and learn separate dynamic weights in parallel 2-5. fine

onnx just request for more detiles of arch. if official arches will release soon. onnx is not needed.

thx your good work!


Literally, the brilliant work of idyanamic from Atten4vis is the foundation for our proposed MHDLSA. We devoted a lot of effort to reconstructing the DemystifyLocalViT, though, because it was designed for high-level computer vision tasks. Ultimately, we proposed our MHDLSA for SR tasks. As pointed out in the paper, in order to provide a more relevant attention map for dynamic convolution, we remove the redundant normalization layer and non-linear activation and add a depth-wise convolution. (L262-L270 in Section 3.1 in the manuscript).

NeonLeexiang commented 1 year ago

😬hiii

  1. There is no pooling operation in MHDLSA since doing so would result in some information loss for SR tasks.
  2. Due to the comparable parameter count with MHSA, the squeeze factor number is 4.
  3. The num of kernel size is 7, compares to the 8 window size of MHSA.
  4. The fnn_expansion_factor is 2, compares to the SwinIR.
  5. The kernel size of TLC is set to 48 (the training patch size) for the x4 upscale factor.

Tips: We reconstruct the https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py into MHDLSA, as indicated in the paper. We are grateful for the excellent work done by the https://github.com/Atten4Vis/DemystifyLocalViT/ 🙏: We apologize for not extending our work to the onnx model for mobile devices.

TODO: Soon, we'll make the archs file available with more information about the network architecture.

We admit that our work has advanced significantly as a result of our completion of several details during the network structure design phase. I do, however, hope that it might help you in applying the methods that we proposed.

thx your quick replay.

  1. ~no pool seems the dim do not mach? how to get that work? the Reshape() in eq.1 combine H*W to 1? how? and what is G mean? maybe I just miss this part in the whole implemente. thx ~ seems I miss the line x denotes the pixel index. have a clue thx, so it is idynamic? G head use? and maybe a code example for The detailed network of the dynamic weight generation is shown in Figure 2. Similar to the multi-head selfattention methods [21, 23, 31], we divide the number of feature channels into G heads and learn separate dynamic weights in parallel 2-5. fine

onnx just request for more detiles of arch. if official arches will release soon. onnx is not needed. thx your good work!

Literally, the brilliant work of idyanamic from Atten4vis is the foundation for our proposed MHDLSA. We devoted a lot of effort to reconstructing the DemystifyLocalViT, though, because it was designed for high-level computer vision tasks. Ultimately, we proposed our MHDLSA for SR tasks. As pointed out in the paper, in order to provide a more relevant attention map for dynamic convolution, we remove the redundant normalization layer and non-linear activation and add a depth-wise convolution. (L262-L270 in Section 3.1 in the manuscript).



class IDynamicDWConv(nn.Module):

    def __init__(self,
                 channels,
                 kernel_size,
                 group_channels):
        super(IDynamicDWConv, self).__init__()
        self.kernel_size = kernel_size
        self.channels = channels
        reduction_ratio = 4
        self.group_channels = group_channels
        self.groups = self.channels // self.group_channels
        self.conv1 = nn.Sequential(
            nn.Conv2d(channels, channels // reduction_ratio, 1),
            # nn.BatchNorm2d(channels // reduction_ratio),
            # nn.ReLU()
            # As mentioned, remove redundant normalization and activation, add depth-wise convolution
            nn.Conv2d(channels // reduction_ratio, channels // reduction_ratio,  self.kernel_size, padding=self.kernel_size//2, groups=channels // reduction_ratio),
        )
        self.conv2 = nn.Sequential(
            nn.Conv2d(channels // reduction_ratio, kernel_size**2 * self.groups, 1)
        )

    def forward(self, x):
        weight = self.conv2(self.conv1(x))
        b, c, h, w = weight.shape
        weight = weight.view(b, self.groups, self.kernel_size, self.kernel_size, h, w)
        out = _idynamic_cuda(x, weight, stride=1, padding=(self.kernel_size-1)//2)
        return out
FlotingDream commented 1 year ago

😬hiii

  1. There is no pooling operation in MHDLSA since doing so would result in some information loss for SR tasks.
  2. Due to the comparable parameter count with MHSA, the squeeze factor number is 4.
  3. The num of kernel size is 7, compares to the 8 window size of MHSA.
  4. The fnn_expansion_factor is 2, compares to the SwinIR.
  5. The kernel size of TLC is set to 48 (the training patch size) for the x4 upscale factor.

Tips: We reconstruct the https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py into MHDLSA, as indicated in the paper. We are grateful for the excellent work done by the https://github.com/Atten4Vis/DemystifyLocalViT/ 🙏: We apologize for not extending our work to the onnx model for mobile devices.

TODO: Soon, we'll make the archs file available with more information about the network architecture.

We admit that our work has advanced significantly as a result of our completion of several details during the network structure design phase. I do, however, hope that it might help you in applying the methods that we proposed.

thx your quick replay.

  1. ~no pool seems the dim do not mach? how to get that work? the Reshape() in eq.1 combine H*W to 1? how? and what is G mean? maybe I just miss this part in the whole implemente. thx ~ seems I miss the line x denotes the pixel index. have a clue thx, so it is idynamic? G head use? and maybe a code example for The detailed network of the dynamic weight generation is shown in Figure 2. Similar to the multi-head selfattention methods [21, 23, 31], we divide the number of feature channels into G heads and learn separate dynamic weights in parallel 2-5. fine

onnx just request for more detiles of arch. if official arches will release soon. onnx is not needed. thx your good work!

Literally, the brilliant work of idyanamic from Atten4vis is the foundation for our proposed MHDLSA. We devoted a lot of effort to reconstructing the DemystifyLocalViT, though, because it was designed for high-level computer vision tasks. Ultimately, we proposed our MHDLSA for SR tasks. As pointed out in the paper, in order to provide a more relevant attention map for dynamic convolution, we remove the redundant normalization layer and non-linear activation and add a depth-wise convolution. (L262-L270 in Section 3.1 in the manuscript).

class IDynamicDWConv(nn.Module):

    def __init__(self,
                 channels,
                 kernel_size,
                 group_channels):
        super(IDynamicDWConv, self).__init__()
        self.kernel_size = kernel_size
        self.channels = channels
        reduction_ratio = 4
        self.group_channels = group_channels
        self.groups = self.channels // self.group_channels
        self.conv1 = nn.Sequential(
            nn.Conv2d(channels, channels // reduction_ratio, 1),
            # nn.BatchNorm2d(channels // reduction_ratio),
            # nn.ReLU()
            # As mentioned, remove redundant normalization and activation, add depth-wise convolution
            nn.Conv2d(channels // reduction_ratio, channels // reduction_ratio,  self.kernel_size, padding=self.kernel_size//2, groups=channels // reduction_ratio),
        )
        self.conv2 = nn.Sequential(
            nn.Conv2d(channels // reduction_ratio, kernel_size**2 * self.groups, 1)
        )

    def forward(self, x):
        weight = self.conv2(self.conv1(x))
        b, c, h, w = weight.shape
        weight = weight.view(b, self.groups, self.kernel_size, self.kernel_size, h, w)
        out = _idynamic_cuda(x, weight, stride=1, padding=(self.kernel_size-1)//2)
        return out

that's clear. thx

FlotingDream commented 1 year ago

one more thing the group_channels setting to?

NeonLeexiang commented 1 year ago

one more thing the group_channels setting to?


While setting the inhomogeneous=True, the heads is set to 6 (compares to SwinIR) in the DWBlock-> https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py

maybe the paper https://github.com/Atten4Vis/DemystifyLocalViT/ will help you understand the key designs of inhomogeneous DWConvolution and our proposed MHDLSA.

class DWBlock(nn.Module):

    def __init__(self, dim, window_size, dynamic=False, inhomogeneous=False, heads=None):
        super().__init__()
        self.dim = dim
        self.window_size = window_size  # Wh, Ww
        self.dynamic = dynamic 
        self.inhomogeneous = inhomogeneous
        self.heads = heads

       # remove the BatchNorm according to the research of EDSR.
       # remove the redundant activation.        

        # pw-linear
        self.conv0 = nn.Conv2d(dim, dim, 1, bias=False)
        # self.bn0 = nn.BatchNorm2d(dim)

        if dynamic and not inhomogeneous:
            self.conv = DynamicDWConv(dim, kernel_size=window_size, stride=1, padding=window_size // 2, groups=dim)
        if dynamic and inhomogeneous:
            print(window_size, heads)
            self.conv = IDynamicDWConv(dim, window_size, heads)
        else :
            self.conv = nn.Conv2d(dim, dim, kernel_size=window_size, stride=1, padding=window_size // 2, groups=dim)

        # self.bn = nn.BatchNorm2d(dim)
        # self.relu=nn.ReLU(inplace=True)

        # pw-linear
        self.conv2=nn.Conv2d(dim, dim, 1, bias=False)
        self.bn2 = nn.BatchNorm2d(dim)
FlotingDream commented 1 year ago

one more thing the group_channels setting to?

While setting the inhomogeneous=True, the heads is set to 6 (compares to SwinIR) in the DWBlock-> https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py

maybe the paper https://github.com/Atten4Vis/DemystifyLocalViT/ will help you understand the key designs of inhomogeneous DWConvolution and our proposed MHDLSA.

class DWBlock(nn.Module):

    def __init__(self, dim, window_size, dynamic=False, inhomogeneous=False, heads=None):
        super().__init__()
        self.dim = dim
        self.window_size = window_size  # Wh, Ww
        self.dynamic = dynamic 
        self.inhomogeneous = inhomogeneous
        self.heads = heads

       # remove the BatchNorm according to the research of EDSR.
       # remove the redundant activation.        

        # pw-linear
        self.conv0 = nn.Conv2d(dim, dim, 1, bias=False)
        # self.bn0 = nn.BatchNorm2d(dim)

        if dynamic and not inhomogeneous:
            self.conv = DynamicDWConv(dim, kernel_size=window_size, stride=1, padding=window_size // 2, groups=dim)
        if dynamic and inhomogeneous:
            print(window_size, heads)
            self.conv = IDynamicDWConv(dim, window_size, heads)
        else :
            self.conv = nn.Conv2d(dim, dim, kernel_size=window_size, stride=1, padding=window_size // 2, groups=dim)

        # self.bn = nn.BatchNorm2d(dim)
        # self.relu=nn.ReLU(inplace=True)

        # pw-linear
        self.conv2=nn.Conv2d(dim, dim, 1, bias=False)
        self.bn2 = nn.BatchNorm2d(dim)

that's clear, thx!