Closed FlotingDream closed 1 year ago
😬hiii
Tips: We reconstruct the https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py into MHDLSA, as indicated in the paper. We are grateful for the excellent work done by the https://github.com/Atten4Vis/DemystifyLocalViT/
🙏: We apologize for not extending our work to the onnx model for mobile devices.
We admit that our work has advanced significantly as a result of our completion of several details during the network structure design phase. I do, however, hope that it might help you in applying the methods that we proposed.
😬hiii
- There is no pooling operation in MHDLSA since doing so would result in some information loss for SR tasks.
- Due to the comparable parameter count with MHSA, the squeeze factor number is 4.
- The num of kernel size is 7, compares to the 8 window size of MHSA.
- The fnn_expansion_factor is 2, compares to the SwinIR.
- The kernel size of TLC is set to 48 (the training patch size) for the x4 upscale factor.
Tips: We reconstruct the https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py into MHDLSA, as indicated in the paper. We are grateful for the excellent work done by the https://github.com/Atten4Vis/DemystifyLocalViT/
🙏: We apologize for not extending our work to the onnx model for mobile devices.
TODO: Soon, we'll make the archs file available with more information about the network architecture.
We admit that our work has advanced significantly as a result of our completion of several details during the network structure design phase. I do, however, hope that it might help you in applying the methods that we proposed.
thx your quick replay.
onnx just request for more detiles of arch. if official arches will release soon. onnx is not needed.
thx your good work!
😬hiii
- There is no pooling operation in MHDLSA since doing so would result in some information loss for SR tasks.
- Due to the comparable parameter count with MHSA, the squeeze factor number is 4.
- The num of kernel size is 7, compares to the 8 window size of MHSA.
- The fnn_expansion_factor is 2, compares to the SwinIR.
- The kernel size of TLC is set to 48 (the training patch size) for the x4 upscale factor.
Tips: We reconstruct the https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py into MHDLSA, as indicated in the paper. We are grateful for the excellent work done by the https://github.com/Atten4Vis/DemystifyLocalViT/ 🙏: We apologize for not extending our work to the onnx model for mobile devices.
TODO: Soon, we'll make the archs file available with more information about the network architecture.
We admit that our work has advanced significantly as a result of our completion of several details during the network structure design phase. I do, however, hope that it might help you in applying the methods that we proposed.
thx your quick replay.
- ~no pool seems the dim do not mach? how to get that work? the Reshape() in eq.1 combine H*W to 1? how? and what is G mean? maybe I just miss this part in the whole implemente. thx ~ seems I miss the line x denotes the pixel index. have a clue thx, so it is idynamic? G head use? and maybe a code example for The detailed network of the dynamic weight generation is shown in Figure 2. Similar to the multi-head selfattention methods [21, 23, 31], we divide the number of feature channels into G heads and learn separate dynamic weights in parallel 2-5. fine
onnx just request for more detiles of arch. if official arches will release soon. onnx is not needed.
thx your good work!
Literally, the brilliant work of idyanamic from Atten4vis is the foundation for our proposed MHDLSA. We devoted a lot of effort to reconstructing the DemystifyLocalViT, though, because it was designed for high-level computer vision tasks. Ultimately, we proposed our MHDLSA for SR tasks. As pointed out in the paper, in order to provide a more relevant attention map for dynamic convolution, we remove the redundant normalization layer and non-linear activation and add a depth-wise convolution. (L262-L270 in Section 3.1 in the manuscript).
😬hiii
- There is no pooling operation in MHDLSA since doing so would result in some information loss for SR tasks.
- Due to the comparable parameter count with MHSA, the squeeze factor number is 4.
- The num of kernel size is 7, compares to the 8 window size of MHSA.
- The fnn_expansion_factor is 2, compares to the SwinIR.
- The kernel size of TLC is set to 48 (the training patch size) for the x4 upscale factor.
Tips: We reconstruct the https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py into MHDLSA, as indicated in the paper. We are grateful for the excellent work done by the https://github.com/Atten4Vis/DemystifyLocalViT/ 🙏: We apologize for not extending our work to the onnx model for mobile devices.
TODO: Soon, we'll make the archs file available with more information about the network architecture.
We admit that our work has advanced significantly as a result of our completion of several details during the network structure design phase. I do, however, hope that it might help you in applying the methods that we proposed.
thx your quick replay.
- ~no pool seems the dim do not mach? how to get that work? the Reshape() in eq.1 combine H*W to 1? how? and what is G mean? maybe I just miss this part in the whole implemente. thx ~ seems I miss the line x denotes the pixel index. have a clue thx, so it is idynamic? G head use? and maybe a code example for The detailed network of the dynamic weight generation is shown in Figure 2. Similar to the multi-head selfattention methods [21, 23, 31], we divide the number of feature channels into G heads and learn separate dynamic weights in parallel 2-5. fine
onnx just request for more detiles of arch. if official arches will release soon. onnx is not needed. thx your good work!
Literally, the brilliant work of idyanamic from Atten4vis is the foundation for our proposed MHDLSA. We devoted a lot of effort to reconstructing the DemystifyLocalViT, though, because it was designed for high-level computer vision tasks. Ultimately, we proposed our MHDLSA for SR tasks. As pointed out in the paper, in order to provide a more relevant attention map for dynamic convolution, we remove the redundant normalization layer and non-linear activation and add a depth-wise convolution. (L262-L270 in Section 3.1 in the manuscript).
class IDynamicDWConv(nn.Module):
def __init__(self,
channels,
kernel_size,
group_channels):
super(IDynamicDWConv, self).__init__()
self.kernel_size = kernel_size
self.channels = channels
reduction_ratio = 4
self.group_channels = group_channels
self.groups = self.channels // self.group_channels
self.conv1 = nn.Sequential(
nn.Conv2d(channels, channels // reduction_ratio, 1),
# nn.BatchNorm2d(channels // reduction_ratio),
# nn.ReLU()
# As mentioned, remove redundant normalization and activation, add depth-wise convolution
nn.Conv2d(channels // reduction_ratio, channels // reduction_ratio, self.kernel_size, padding=self.kernel_size//2, groups=channels // reduction_ratio),
)
self.conv2 = nn.Sequential(
nn.Conv2d(channels // reduction_ratio, kernel_size**2 * self.groups, 1)
)
def forward(self, x):
weight = self.conv2(self.conv1(x))
b, c, h, w = weight.shape
weight = weight.view(b, self.groups, self.kernel_size, self.kernel_size, h, w)
out = _idynamic_cuda(x, weight, stride=1, padding=(self.kernel_size-1)//2)
return out
😬hiii
- There is no pooling operation in MHDLSA since doing so would result in some information loss for SR tasks.
- Due to the comparable parameter count with MHSA, the squeeze factor number is 4.
- The num of kernel size is 7, compares to the 8 window size of MHSA.
- The fnn_expansion_factor is 2, compares to the SwinIR.
- The kernel size of TLC is set to 48 (the training patch size) for the x4 upscale factor.
Tips: We reconstruct the https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py into MHDLSA, as indicated in the paper. We are grateful for the excellent work done by the https://github.com/Atten4Vis/DemystifyLocalViT/ 🙏: We apologize for not extending our work to the onnx model for mobile devices.
TODO: Soon, we'll make the archs file available with more information about the network architecture.
We admit that our work has advanced significantly as a result of our completion of several details during the network structure design phase. I do, however, hope that it might help you in applying the methods that we proposed.
thx your quick replay.
- ~no pool seems the dim do not mach? how to get that work? the Reshape() in eq.1 combine H*W to 1? how? and what is G mean? maybe I just miss this part in the whole implemente. thx ~ seems I miss the line x denotes the pixel index. have a clue thx, so it is idynamic? G head use? and maybe a code example for The detailed network of the dynamic weight generation is shown in Figure 2. Similar to the multi-head selfattention methods [21, 23, 31], we divide the number of feature channels into G heads and learn separate dynamic weights in parallel 2-5. fine
onnx just request for more detiles of arch. if official arches will release soon. onnx is not needed. thx your good work!
Literally, the brilliant work of idyanamic from Atten4vis is the foundation for our proposed MHDLSA. We devoted a lot of effort to reconstructing the DemystifyLocalViT, though, because it was designed for high-level computer vision tasks. Ultimately, we proposed our MHDLSA for SR tasks. As pointed out in the paper, in order to provide a more relevant attention map for dynamic convolution, we remove the redundant normalization layer and non-linear activation and add a depth-wise convolution. (L262-L270 in Section 3.1 in the manuscript).
class IDynamicDWConv(nn.Module): def __init__(self, channels, kernel_size, group_channels): super(IDynamicDWConv, self).__init__() self.kernel_size = kernel_size self.channels = channels reduction_ratio = 4 self.group_channels = group_channels self.groups = self.channels // self.group_channels self.conv1 = nn.Sequential( nn.Conv2d(channels, channels // reduction_ratio, 1), # nn.BatchNorm2d(channels // reduction_ratio), # nn.ReLU() # As mentioned, remove redundant normalization and activation, add depth-wise convolution nn.Conv2d(channels // reduction_ratio, channels // reduction_ratio, self.kernel_size, padding=self.kernel_size//2, groups=channels // reduction_ratio), ) self.conv2 = nn.Sequential( nn.Conv2d(channels // reduction_ratio, kernel_size**2 * self.groups, 1) ) def forward(self, x): weight = self.conv2(self.conv1(x)) b, c, h, w = weight.shape weight = weight.view(b, self.groups, self.kernel_size, self.kernel_size, h, w) out = _idynamic_cuda(x, weight, stride=1, padding=(self.kernel_size-1)//2) return out
that's clear. thx
one more thing the group_channels setting to?
one more thing the group_channels setting to?
While setting the inhomogeneous=True
, the heads
is set to 6 (compares to SwinIR) in the DWBlock-> https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py
maybe the paper https://github.com/Atten4Vis/DemystifyLocalViT/ will help you understand the key designs of inhomogeneous DWConvolution and our proposed MHDLSA.
class DWBlock(nn.Module):
def __init__(self, dim, window_size, dynamic=False, inhomogeneous=False, heads=None):
super().__init__()
self.dim = dim
self.window_size = window_size # Wh, Ww
self.dynamic = dynamic
self.inhomogeneous = inhomogeneous
self.heads = heads
# remove the BatchNorm according to the research of EDSR.
# remove the redundant activation.
# pw-linear
self.conv0 = nn.Conv2d(dim, dim, 1, bias=False)
# self.bn0 = nn.BatchNorm2d(dim)
if dynamic and not inhomogeneous:
self.conv = DynamicDWConv(dim, kernel_size=window_size, stride=1, padding=window_size // 2, groups=dim)
if dynamic and inhomogeneous:
print(window_size, heads)
self.conv = IDynamicDWConv(dim, window_size, heads)
else :
self.conv = nn.Conv2d(dim, dim, kernel_size=window_size, stride=1, padding=window_size // 2, groups=dim)
# self.bn = nn.BatchNorm2d(dim)
# self.relu=nn.ReLU(inplace=True)
# pw-linear
self.conv2=nn.Conv2d(dim, dim, 1, bias=False)
self.bn2 = nn.BatchNorm2d(dim)
one more thing the group_channels setting to?
While setting the
inhomogeneous=True
, theheads
is set to 6 (compares to SwinIR) in the DWBlock-> https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.pymaybe the paper https://github.com/Atten4Vis/DemystifyLocalViT/ will help you understand the key designs of inhomogeneous DWConvolution and our proposed MHDLSA.
class DWBlock(nn.Module): def __init__(self, dim, window_size, dynamic=False, inhomogeneous=False, heads=None): super().__init__() self.dim = dim self.window_size = window_size # Wh, Ww self.dynamic = dynamic self.inhomogeneous = inhomogeneous self.heads = heads # remove the BatchNorm according to the research of EDSR. # remove the redundant activation. # pw-linear self.conv0 = nn.Conv2d(dim, dim, 1, bias=False) # self.bn0 = nn.BatchNorm2d(dim) if dynamic and not inhomogeneous: self.conv = DynamicDWConv(dim, kernel_size=window_size, stride=1, padding=window_size // 2, groups=dim) if dynamic and inhomogeneous: print(window_size, heads) self.conv = IDynamicDWConv(dim, window_size, heads) else : self.conv = nn.Conv2d(dim, dim, kernel_size=window_size, stride=1, padding=window_size // 2, groups=dim) # self.bn = nn.BatchNorm2d(dim) # self.relu=nn.ReLU(inplace=True) # pw-linear self.conv2=nn.Conv2d(dim, dim, 1, bias=False) self.bn2 = nn.BatchNorm2d(dim)
that's clear, thx!
Hi, I am wondering if you could share the arch/model?
Or I'm trying to implement it myself and have some problems. If you can help me, it will be great.
if you can share a onnx model file with like onnx.save(onnx.shape_inference.infer_shapes(onnx_model), model_file), it will help me to refer to the arch structure in ntron.app.
Anyway, I'm still looking forward to the official arch‘s release.
Thx!