apache / singa

a distributed deep learning platform
Apache License 2.0
3.34k stars 1.23k forks source link

AlexNet bacward shape missmatch + ReLu return a tuple #681

Open Belegkarnil opened 4 years ago

Belegkarnil commented 4 years ago


I have implemented AlexNet in singa but I obtain an error during the backward_and_update instruction. I am using Singa 3.0.0.rc1 on cpu.

This is my AlexNet implementation: `from singa import autograd from singa import module from singa import opt

all = ['AlexNet', 'alexnet']

class AlexNet(module.Module): def init(self, num_classes=1000): super(AlexNet, self).init()

12 sur GPU donc 6 & 6

    self.features1 = [
        autograd.MaxPool2d(kernel_size=3, stride=2),
        autograd.MaxPool2d(kernel_size=3, stride=2),
        autograd.Conv2d(384, 256,kernel_size=3,padding=1),
    self.features2 = [
        autograd.Conv2d(256, 256,kernel_size=3,padding=1),
        autograd.MaxPool2d(kernel_size=3, stride=2)
    self.avgpool = autograd.AvgPool2d(6, stride=1)
    self.flatten = autograd.Flatten()
    self.classifier = [
        autograd.Linear(256 * 6 * 6, 4096),
        autograd.Linear(4096, 4096),
        autograd.Linear(4096, num_classes)
    self.optimizer = opt.SGD(lr=0.001, momentum=0.9)
def loss(self, out, ty):
    return autograd.softmax_cross_entropy(out, ty)
def optim(self, loss, dist_option, spars):
    if dist_option == 'fp32':
    elif dist_option == 'fp16':
    elif dist_option == 'partialUpdate':
    elif dist_option == 'sparseTopK':
        self.optimizer.backward_and_sparse_update(loss, topK=True, spars=spars)
    elif dist_option == 'sparseThreshold':
        self.optimizer.backward_and_sparse_update(loss, topK=False, spars=spars)
def forward(self, x):
    for (i,layers) in enumerate([self.features1, self.features2, [ self.avgpool,self.flatten  ] , self.classifier]):
        for (j,fn) in enumerate(layers):
            x = fn(x)
            if(type(x) is tuple):# FIXME I have to do that because of a bug in Singa? (ReLU)
                x = x[0]
    return x

def alexnet(kwargs): return AlexNet(kwargs) ` And I get : AssertionError: ('shape mismatch', (9216, 4096), (256, 4096)) Which is my first linear layer : 256 6 6, 4096

When I use my VGG16 implementation, I got a similar error : AssertionError: ('shape mismatch', (25088, 4096), (512, 4096))

It seems that the backward operation does not map the correct shape to the corresponding layer.

Moreover, the ReLu class return a 1-tuple containing a Tensor. Is it intended or is it a bug?

dcslin commented 4 years ago

Hi, as pointed out by @chrishkchris , the convention is to use RELU as stateless layer. usage: https://github.com/apache/singa/blob/master/examples/cnn/model/cnn.py#L40

For shape mismatch, you might need to check the shape of layers again. Let me know if further info is required.

Belegkarnil commented 4 years ago

Ok, I'll try but why to provide a statefull ReLU Layer? Is it for a specific purpose?

Belegkarnil commented 4 years ago

I compared my implementation to other frameworks and it is the same shapes. Moreover the forward pass does not cause any issue, it is the backward pass. This is why I suspect a bug. Is it possible?

nudles commented 4 years ago

Hi, as pointed out by @chrishkchris , the convention is to use RELU as stateless layer. usage: https://github.com/apache/singa/blob/master/examples/cnn/model/cnn.py#L40

For shape mismatch, you might need to check the shape of layers again. Let me know if further info is required.

@dcslin Did you try to run the code pasted by @Belegkarnil ? Can you reproduce the error?

dcslin commented 4 years ago

Hi, as pointed out by @chrishkchris , the convention is to use RELU as stateless layer. usage: https://github.com/apache/singa/blob/master/examples/cnn/model/cnn.py#L40 For shape mismatch, you might need to check the shape of layers again. Let me know if further info is required.

@dcslin Did you try to run the code pasted by @Belegkarnil ? Can you reproduce the error?

I am still checking the code

dcslin commented 4 years ago

Hi @Belegkarnil, you might need to change 256 6 6, 4096 to 256, 4096 to make it works.

Also you are recommended to use relu/dropout/flatten like this https://github.com/apache/singa/blob/master/examples/cnn/model/cnn.py#L40

Belegkarnil commented 4 years ago

Ok thanks a lot ! I assumed that it works like other frameworks but that the result of AvgPool has a different shape.