chineseocr / darknet-ocr

darknet text detect and darknet cnn ocr
MIT License
1.13k stars 288 forks source link

关于模型结构 #97

Open yuxx0218 opened 4 years ago

yuxx0218 commented 4 years ago

我再pytorch根据model/ocr/chinese/ocr.cfg复现了这个网络,并根据要求输入了[1, 1, 32, 256]的图像,发现网络的输出尺寸为[1, 11316, 3, 63],请问这个输出的含义是什么呢?按照我的理解,输出是[1, 11361, 1, n],其中11361表示11361个汉字的prob,n表示生成的文字序列的长度。不知道是哪里出了问题,求指教!(没用过darknet,所以不知道如何查看网络结构的实际实现)

`class CRNN(nn.Module): def init(self, imgC): super(CRNN, self).init() self.conv1 = nn.Conv2d(imgC, 64, 3, 1, 1) self.relu1 = nn.ReLU() self.mpool1 = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(64, 128, 3, 1, 1) self.relu2 = nn.ReLU() self.mpool2 = nn.MaxPool2d(2, 2) self.conv3 = nn.Conv2d(128, 256, 3, 1, 1) self.relu3 = nn.ReLU() self.conv4 = nn.Conv2d(256, 256, 3, 1, 1) self.relu4 = nn.ReLU() self.mpool3 = nn.MaxPool2d(2, (2,1), 0) self.conv5 = nn.Conv2d(256, 512, 3, 1, 1) self.relu5 = nn.ReLU() self.conv6 = nn.Conv2d(512, 512, 3, 1, 1) self.relu6 = nn.ReLU() self.mpool4 = nn.MaxPool2d(2, (2, 1), 0) self.conv7 = nn.Conv2d(512, 512, 2, 1, 0) self.relu7 = nn.ReLU() self.conv8 = nn.Conv2d(512, 11316, 1, 1, 1)

def forward(self, x):
    x = self.mpool1(self.relu1(self.conv1(x)))
    x = self.mpool2(self.relu2(self.conv2(x)))
    x = self.relu3(self.conv3(x))
    x = self.mpool3(self.relu4(self.conv4(x)))
    x = self.relu5(self.conv5(x))
    x = self.mpool4(self.relu6(self.conv6(x)))
    x = self.relu7(self.conv7(x))
    x = self.conv8(x)
    return x`
yuxx0218 commented 4 years ago

我仿佛解决了,conv8的pad改为0就好了,输出为[1, 11316, 1, 61],请问这是大神笔误吗?另外,11316表示11315个汉字还有1个blank吗?61是序列长度?请问有11315个汉字的样本集吗?