AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.74k stars 7.96k forks source link

the route of yolov4-tiny.cfg #6086

Open XDhughie opened 4 years ago

XDhughie commented 4 years ago

[route] layers=-1 groups=2 group_id=1

what doed the "groups=2 group_id=1" mean?

WongKinYiu commented 4 years ago

split previous layer (layers=-1) into two parts through channel (groups=2) and route the second part (group_id=1, id start from 0).

ai815 commented 4 years ago

split previous layer (layers=-1) into two parts through channel (groups=2) and route the second part (group_id=1, id start from 0).

hey, can I ask what the meaning of channel here?

WongKinYiu commented 4 years ago

size of feature map is width height channel.

ai815 commented 4 years ago

size of feature map is width height channel.

oh I see thanks. but how could we seperaste a layer into two parts, can you plz explain?

WongKinYiu commented 4 years ago

if width height channel is w h c, the feature map will be fm[0:w, 0:h, 0:c]. groups = 2, group_id = 0 will gets fm[0:w, 0:h, 0:c/2]. groups = 2, group_id = 1 will gets fm[0:w, 0:h, c/2:c].

ankandrew commented 4 years ago

@WongKinYiu I don't understand why (visually with Netron) the num. of c is 64

image

Shouldn't be 32 because c/2?

WongKinYiu commented 4 years ago

maybe it because darknet use group_id but netron use groups_id. https://github.com/AlexeyAB/darknet/blob/master/src/parser.c#L1036 https://github.com/lutzroeder/netron/blob/master/src/darknet-metadata.json#L281

you can try to modify the code of netron and generate the graph again.

ankandrew commented 4 years ago

@WongKinYiu I just tried changing group_id to groups_id but they produce both the same result. I should open an issue in netron I guess

ShaneHsieh commented 4 years ago

@WongKinYiu Hi

Yolov4-tiny only use the group_id=1? So many channel is thrown away?

WongKinYiu commented 4 years ago

no, cross stage connection routes all of channel of base layer in yolov4-tiny.

ShaneHsieh commented 4 years ago

@WongKinYiu I can understand it. You mean that only this route layer uses half channel,but previous layer that uses all channel will be concat?

WongKinYiu commented 4 years ago

image

hhaAndroid commented 4 years ago

is right?

`class ResConv2dBatchLeaky(nn.Module):

def __init__(self, in_channels, inter_channels, kernel_size, stride=1, leaky_slope=0.1, return_extra=False):
    super(ResConv2dBatchLeaky, self).__init__()

    self.return_extra = return_extra
    self.in_channels = in_channels
    self.inter_channels = inter_channels
    self.kernel_size = kernel_size
    self.stride = stride
    if isinstance(kernel_size, (list, tuple)):
        self.padding = [int(ii / 2) for ii in kernel_size]
    else:
        self.padding = int(kernel_size / 2)
    self.leaky_slope = leaky_slope

    self.layers0 = Conv2dBatchLeaky(self.in_channels//2, self.inter_channels, self.kernel_size, self.stride,
                                    self.padding)
    self.layers1 = Conv2dBatchLeaky(self.inter_channels, self.inter_channels, self.kernel_size, self.stride,
                                    self.padding)
    self.layers2 = Conv2dBatchLeaky(self.in_channels, self.in_channels, 1, 1, 0)

def forward(self, x):
    y0 = x
    channel = x.shape[1]
    x0 = x[:, channel // 2:, ...]
    x1 = self.layers0(x0)
    x2 = self.layers1(x1)
    x3 = torch.cat((x2, x1), dim=1)
    x4 = self.layers2(x3)
    x = torch.cat((y0, x4), dim=1)
    if self.return_extra:
        return x, x4
    else:
        return x`

@WongKinYiu

hhaAndroid commented 4 years ago

The whole model structure is as follows? is right?

`class TinyYolov4(nn.Module):

def __init__(self, pretrained=False):
    super(TinyYolov4, self).__init__()

    # Network
    backbone = OrderedDict([
        ('0_convbatch', vn_layer.Conv2dBatchLeaky(3, 32, 3, 2)),
        ('1_convbatch', vn_layer.Conv2dBatchLeaky(32, 64, 3, 2)),
        ('2_convbatch', vn_layer.Conv2dBatchLeaky(64, 64, 3, 1)),
        ('3_resconvbatch', vn_layer.ResConv2dBatchLeaky(64, 32, 3, 1)),
        ('4_max', nn.MaxPool2d(2, 2)),
        ('5_convbatch', vn_layer.Conv2dBatchLeaky(128, 128, 3, 1)),
        ('6_resconvbatch', vn_layer.ResConv2dBatchLeaky(128, 64, 3, 1)),
        ('7_max', nn.MaxPool2d(2, 2)),
        ('8_convbatch', vn_layer.Conv2dBatchLeaky(256, 256, 3, 1)),
        ('9_resconvbatch', vn_layer.ResConv2dBatchLeaky(256, 128, 3, 1, return_extra=True)),
    ])

    head = [
        OrderedDict([
            ('10_max', nn.MaxPool2d(2, 2)),
            ('11_conv', vn_layer.Conv2dBatchLeaky(512, 512, 3, 1)),
            ('12_conv', vn_layer.Conv2dBatchLeaky(512, 256, 1, 1)),
        ]),

        OrderedDict([
            ('13_conv', vn_layer.Conv2dBatchLeaky(256, 512, 3, 1)),
            ('14_conv', nn.Conv2d(512, 3 * (5 + 80), 1)),
        ]),

        OrderedDict([
            ('15_convbatch', vn_layer.Conv2dBatchLeaky(256, 128, 1, 1)),
            ('16_upsample', nn.Upsample(scale_factor=2)),
        ]),

        OrderedDict([
            ('17_convbatch', vn_layer.Conv2dBatchLeaky(384, 256, 3, 1)),
            ('18_conv', nn.Conv2d(256, 3 * (5 + 80), 1)),
        ]),
    ]

    self.backbone = nn.Sequential(backbone)
    self.head = nn.ModuleList([nn.Sequential(layer_dict) for layer_dict in head])
    self.init_weights(pretrained)

def forward(self, x):
    stem, extra_x = self.backbone(x)
    stage0 = self.head[0](stem)
    head0 = self.head[1](stage0)

    stage1 = self.head[2](stage0)
    stage2 = torch.cat((stage1, extra_x), dim=1)
    head1 = self.head[3](stage2)
    head = [head1, head0]
    return head`
WongKinYiu commented 4 years ago

i think yes.

class ResConv2dBatchLeaky(nn.Module):
  def __init__(self, in_channels, inter_channels, kernel_size, stride=1, leaky_slope=0.1, return_extra=False):
    super(ResConv2dBatchLeaky, self).__init__()

    self.return_extra = return_extra
    self.in_channels = in_channels
    self.inter_channels = inter_channels
    self.kernel_size = kernel_size
    self.stride = stride
    if isinstance(kernel_size, (list, tuple)):
        self.padding = [int(ii / 2) for ii in kernel_size]
    else:
        self.padding = int(kernel_size / 2)
    self.leaky_slope = leaky_slope

    self.layers0 = Conv2dBatchLeaky(self.in_channels//2, self.inter_channels, self.kernel_size, self.stride,
                                    self.padding)
    self.layers1 = Conv2dBatchLeaky(self.inter_channels, self.inter_channels, self.kernel_size, self.stride,
                                    self.padding)
    self.layers2 = Conv2dBatchLeaky(self.in_channels, self.in_channels, 1, 1, 0)

  def forward(self, x):
    y0 = x
    channel = x.shape[1]
    x0 = x[:, channel // 2:, ...]
    x1 = self.layers0(x0)
    x2 = self.layers1(x1)
    x3 = torch.cat((x2, x1), dim=1)
    x4 = self.layers2(x3)
    x = torch.cat((y0, x4), dim=1)
    if self.return_extra:
        return x, x4
    else:
        return x
willbattel commented 4 years ago

How come v4 tiny uses the route groups in its cfg file, but the full v4 cfg does not? It looks like they're both CSP-based, but I'm not sure why only v4 tiny uses the route groups. Is it just to increase speed by cutting the feature size in half?

WongKinYiu commented 4 years ago

groups and group_id are implemented after cspnet is designed, so new csp models such as yolov4-tiny implement csp using route groups.

willbattel commented 4 years ago

Interesting. So if YOLOv4 had not been published until later, it would have used the route groups? And I assume the functionality would have been the same as it is currently, just a different implementation? Thanks.

WongKinYiu commented 4 years ago

yes, both of two implementations are equivalent.

WongKinYiu commented 4 years ago

yolov4 is based on resnet, it split channel in base layer and remove bottleneck of res layers. so two path will be {2x, 2x}. yolov4-tiny is based on vovnet, if we split channel in base layer too, two path will be {1x, 3x}. so we split it in computational block to make it be {2x, 2x}. this modification is used for optimizing memory bandwidth.

Lowell-IC commented 4 years ago

if width height channel is w h c, the feature map will be fm[0:w-1, 0:h-1, 0:c-1]. groups = 2, group_id = 0 will gets fm[0:w-1, 0:h-1, 0:c/2-1]. groups = 2, group_id = 1 will gets fm[0:w-1, 0:h-1, c/2:c-1].