Kthyeon / KAIST-AI-NeurIPS2019-MicroNet-2nd-place-solution

NeurIPSCD2019, MicroNet Challenge hosted by Google, Deepmind Researcher, "Efficient Model for Image Classification With Regularization Tricks".
http://proceedings.mlr.press/v123/kim20a.html
MIT License
26 stars 7 forks source link

Official Review #1

Closed micronet-challenge-submissions closed 9 months ago

micronet-challenge-submissions commented 5 years ago

Hi! Thanks for the updates!

Our only outstanding question is about the counting of the mask overhead for sparse weight matrices (1-bit per parameter, including zero valued parameters). Unless I'm missing something, it doesn't look like this is taken into account in your counting script.

Thanks! Trevor

Kthyeon commented 5 years ago

Thanks for your feedback. Through your feedback, we could fix our mistake

As you can see in the (revised)Score_MicroNet.ipynb, we change the method of score. we believe we resolve the overhead issue you mentioned before.

The main counting method is in the 'Counting', and 'count_hooks.py' is the main file. Then, maybe this issue could be related to conv counting.

def count_convNd(m, x, y):
    x = x[0]

    kernel_ops = m.weight.size()[2:].numel() * m.in_channels // m.groups
    bias_ops = 1 #if m.bias is not None else 0

    total_add_ops =  y.nelement() * (kernel_ops * non_sparsity(m.weight) - 1)  + y.nelement() * bias_ops
    total_mul_ops = y.nelement()  * kernel_ops * non_sparsity(m.weight)
    total_params = m.weight.numel() * non_sparsity(m.weight) + m.weight.shape[0]

    m.total_add_ops += torch.Tensor([total_add_ops])
    m.total_mul_ops += torch.Tensor([total_mul_ops])
    m.total_params += torch.Tensor([total_params])

The only overhead issue could occur in the bias operation. We did not use the bias in conv, but in batchnorm. So, we add the bias counting into conv term. The 'y.nelement() * bias_ops' and ' m.weight.shape[0]' in the above is for bias.

We do not consider sparsity in this bias part during training. In details, during pruning process, we did not prune the bias (including 1-bit parameter). Therefore, we thought there would be no sparsity in 1-bit parameter terms.

Thanks, Taehyeon kim

micronet-challenge-submissions commented 5 years ago

We saw the fixes for the batch norm biases and they look good! I am refering to counting the overhead of storing the convolution and linear layer weights in sparse format. Per the rules, this should be counted as a bitmask, with one bit for each element the weight tensor to indicate whether it is zero or nonzero. This should be added to the total paramter count.

Trevor

On Wed, Oct 30, 2019 at 6:26 PM Kthyeon notifications@github.com wrote:

Thanks for your feedback. Through your feedback, we could fix our mistake

As you can see in the (revised)Score_MicroNet.ipynb, we change the method of score. we believe we resolve the overhead issue you mentioned before.

The main counting method is in the 'Counting', and 'count_hooks.py' is the main file. Then, maybe this issue could be related to conv counting.

def count_convNd(m, x, y): x = x[0]

kernel_ops = m.weight.size()[2:].numel() * m.in_channels // m.groups
bias_ops = 1 #if m.bias is not None else 0

total_add_ops =  y.nelement() * (kernel_ops * non_sparsity(m.weight) - 1)  + **y.nelement() * bias_ops**
total_mul_ops = y.nelement()  * kernel_ops * non_sparsity(m.weight)
total_params = m.weight.numel() * non_sparsity(m.weight) + **m.weight.shape[0]**

m.total_add_ops += torch.Tensor([total_add_ops])
m.total_mul_ops += torch.Tensor([total_mul_ops])
m.total_params += torch.Tensor([total_params])

The only overhead issue could occur in the bias operation. We did not use the bias in conv, but in batchnorm. So, we add the bias counting into conv term. The bold text in the above is for bias.

We do not consider sparsity in this bias part during training. In details, during pruning process, we did not prune the bias (including 1-bit parameter). Therefore, we thought there would be no sparsity in 1-bit parameter terms.

Thanks, Taehyeon kim

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Kthyeon/micronet_neurips_challenge/issues/1?email_source=notifications&email_token=AMILA67PZFWOBF472CRNMIDQRIX5BA5CNFSM4JHCJKOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECWIV6A#issuecomment-548178680, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMILA63U5LD4JJZFCPGQEKTQRIX5BANCNFSM4JHCJKOA .

Kthyeon commented 5 years ago

Thanks for quick reply.

But, we wonder that we've trained the network parameters with FP32. Because the precision of all parameters is the same, we decide the bitmask is not needed here.

For freebie, we apply this in the jupyter notebook file.

Then, you mean that should we also apply some bitmask in the counting file even this is freebie?

micronet-challenge-submissions commented 5 years ago

Yes, the bitmask is required for sparse weights. To compute with a sparse tensor, the tensor needs to be stored in a compressed format like compressed sparse row that comes with some storage overhead. We take that into account with the bitmask.

Trevor

On Wed, Oct 30, 2019 at 6:49 PM Kthyeon notifications@github.com wrote:

Thanks for quick reply.

But, we wonder that we've trained the network parameters with FP32. Because the precision of all parameters is the same, we decide the bitmask is not needed here.

For freebie, we apply this in the jupyter notebook file.

Then, you mean that should we also apply some bitmask in the counting file even this is freebie?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Kthyeon/micronet_neurips_challenge/issues/1?email_source=notifications&email_token=AMILA6ZVCNVKZ777E3STVD3QRI2UDA5CNFSM4JHCJKOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECWJXLY#issuecomment-548182959, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMILA67J2MSBC4IBY3KE2U3QRI2UDANCNFSM4JHCJKOA .

Kthyeon commented 5 years ago

We don't know if we understand this well, but once we uploaded a new jupyter notebook file for scoring.

In this code, we add this term

def bitmask(net):
    num = 0 
    for module in net.parameters():
        if module.ndimension() != 1:
            num += module.numel()
    #1-bit per parameter
    return num/32

This function is for bitmask, and

def micro_score(net, precision = 'Freebie'):
    input = torch.randn(1, 3, 32, 32).to(net.device)
    addflops, multflops, params = count(net, inputs=(input, ))

    #use fp-16bit
    if precision == 'Freebie':
        multflops = multflops / 2
        params = params / 2
    #add bit-mask
    params += bitmask(net)

    score = (params/36500000 + (addflops + multflops)/10490000000)
    #print('Non zero ratio: {}'.format(non_zero_ratio))
    print('Score: {}, flops: {}, params: {}'.format(score, addflops + multflops, params))
    return score

Through bit mask, score function is changed like above.

The new score is 0.0054.

micronet-challenge-submissions commented 5 years ago

Looks good! Thanks for the fix! Two quick questions:

  1. Do you still want to submit your "ver2" model? I ran it & checked the score in your revised colab and got .0056, which is an excellent score.
  2. When I run your updated colab I get an error passing "expansion = 3" to the MicroNet class. When I remove this, everything appears to work fine. Just want to make sure this isn't important.

Thanks! Trevor

micronet-challenge-submissions commented 5 years ago

Also, what name would you like your entries posted under when the results are revealed?

Trevor

Kthyeon commented 5 years ago

Thanks for reply.

First, if ver2 network also could be accepted, we want to submit. But, ver1 has better score. If only one of ver1 and ver2 needs to be submitted, we will submit ver1. If not, we want to submit both.

Second, expansion isn’t important. Sorry for confusion.

You mean the team name? Our team name is ‘KAIST AI’ We prefer this name when the results are revealed.

micronet-challenge-submissions commented 5 years ago

You can certainly submit both! Sounds good. Thanks again!

Trevor

On Thu, 31 Oct 2019 at 10:47, Kthyeon notifications@github.com wrote:

Thanks for reply.

First, if ver2 network also could be accepted, we want to submit. But, ver1 has better score. If only one of ver1 and ver2 needs to be submitted, we will submit ver1. If not, we want to submit both.

Second, expansion isn’t important. Sorry for confusion.

You mean the team name? Our team name is ‘KAIST AI’ We prefer this name when the results are revealed.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Kthyeon/micronet_neurips_challenge/issues/1?email_source=notifications&email_token=AMILA65H43CH74NI4DN7FOLQRMKZXA5CNFSM4JHCJKOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECYVHFY#issuecomment-548492183, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMILA66SCRWIKUEZLWWAQCLQRMKZXANCNFSM4JHCJKOA .

Kthyeon commented 5 years ago

If you don't mind me, can you give an approximate current ranking of cifar100?

Taehyeon Kim

micronet-challenge-submissions commented 5 years ago

I can't reveal the results just yet, but I can tell you that we are planning to wrap up the scoring process tomorrow and will be releasing the results early next week.

Trevor

On Thu, 31 Oct 2019 at 16:26, Kthyeon notifications@github.com wrote:

If you don't mind me, can you give an approximate current ranking of cifar100?

Taehyeon Kim

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Kthyeon/micronet_neurips_challenge/issues/1?email_source=notifications&email_token=AMILA63UXZ2EOUMRGKHOJRDQRNSRFA5CNFSM4JHCJKOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECZRNUY#issuecomment-548607699, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMILA65ZWLNJPBPDUMJQPVDQRNSRFANCNFSM4JHCJKOA .