Closed micronet-challenge-submissions closed 9 months ago
Thanks for your feedback. Through your feedback, we could fix our mistake
As you can see in the (revised)Score_MicroNet.ipynb, we change the method of score. we believe we resolve the overhead issue you mentioned before.
The main counting method is in the 'Counting', and 'count_hooks.py' is the main file. Then, maybe this issue could be related to conv counting.
def count_convNd(m, x, y):
x = x[0]
kernel_ops = m.weight.size()[2:].numel() * m.in_channels // m.groups
bias_ops = 1 #if m.bias is not None else 0
total_add_ops = y.nelement() * (kernel_ops * non_sparsity(m.weight) - 1) + y.nelement() * bias_ops
total_mul_ops = y.nelement() * kernel_ops * non_sparsity(m.weight)
total_params = m.weight.numel() * non_sparsity(m.weight) + m.weight.shape[0]
m.total_add_ops += torch.Tensor([total_add_ops])
m.total_mul_ops += torch.Tensor([total_mul_ops])
m.total_params += torch.Tensor([total_params])
The only overhead issue could occur in the bias operation. We did not use the bias in conv, but in batchnorm. So, we add the bias counting into conv term. The 'y.nelement() * bias_ops' and ' m.weight.shape[0]' in the above is for bias.
We do not consider sparsity in this bias part during training. In details, during pruning process, we did not prune the bias (including 1-bit parameter). Therefore, we thought there would be no sparsity in 1-bit parameter terms.
Thanks, Taehyeon kim
We saw the fixes for the batch norm biases and they look good! I am refering to counting the overhead of storing the convolution and linear layer weights in sparse format. Per the rules, this should be counted as a bitmask, with one bit for each element the weight tensor to indicate whether it is zero or nonzero. This should be added to the total paramter count.
Trevor
On Wed, Oct 30, 2019 at 6:26 PM Kthyeon notifications@github.com wrote:
Thanks for your feedback. Through your feedback, we could fix our mistake
As you can see in the (revised)Score_MicroNet.ipynb, we change the method of score. we believe we resolve the overhead issue you mentioned before.
The main counting method is in the 'Counting', and 'count_hooks.py' is the main file. Then, maybe this issue could be related to conv counting.
def count_convNd(m, x, y): x = x[0]
kernel_ops = m.weight.size()[2:].numel() * m.in_channels // m.groups bias_ops = 1 #if m.bias is not None else 0 total_add_ops = y.nelement() * (kernel_ops * non_sparsity(m.weight) - 1) + **y.nelement() * bias_ops** total_mul_ops = y.nelement() * kernel_ops * non_sparsity(m.weight) total_params = m.weight.numel() * non_sparsity(m.weight) + **m.weight.shape[0]** m.total_add_ops += torch.Tensor([total_add_ops]) m.total_mul_ops += torch.Tensor([total_mul_ops]) m.total_params += torch.Tensor([total_params])
The only overhead issue could occur in the bias operation. We did not use the bias in conv, but in batchnorm. So, we add the bias counting into conv term. The bold text in the above is for bias.
We do not consider sparsity in this bias part during training. In details, during pruning process, we did not prune the bias (including 1-bit parameter). Therefore, we thought there would be no sparsity in 1-bit parameter terms.
Thanks, Taehyeon kim
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Kthyeon/micronet_neurips_challenge/issues/1?email_source=notifications&email_token=AMILA67PZFWOBF472CRNMIDQRIX5BA5CNFSM4JHCJKOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECWIV6A#issuecomment-548178680, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMILA63U5LD4JJZFCPGQEKTQRIX5BANCNFSM4JHCJKOA .
Thanks for quick reply.
But, we wonder that we've trained the network parameters with FP32. Because the precision of all parameters is the same, we decide the bitmask is not needed here.
For freebie, we apply this in the jupyter notebook file.
Then, you mean that should we also apply some bitmask in the counting file even this is freebie?
Yes, the bitmask is required for sparse weights. To compute with a sparse tensor, the tensor needs to be stored in a compressed format like compressed sparse row that comes with some storage overhead. We take that into account with the bitmask.
Trevor
On Wed, Oct 30, 2019 at 6:49 PM Kthyeon notifications@github.com wrote:
Thanks for quick reply.
But, we wonder that we've trained the network parameters with FP32. Because the precision of all parameters is the same, we decide the bitmask is not needed here.
For freebie, we apply this in the jupyter notebook file.
Then, you mean that should we also apply some bitmask in the counting file even this is freebie?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Kthyeon/micronet_neurips_challenge/issues/1?email_source=notifications&email_token=AMILA6ZVCNVKZ777E3STVD3QRI2UDA5CNFSM4JHCJKOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECWJXLY#issuecomment-548182959, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMILA67J2MSBC4IBY3KE2U3QRI2UDANCNFSM4JHCJKOA .
We don't know if we understand this well, but once we uploaded a new jupyter notebook file for scoring.
In this code, we add this term
def bitmask(net):
num = 0
for module in net.parameters():
if module.ndimension() != 1:
num += module.numel()
#1-bit per parameter
return num/32
This function is for bitmask, and
def micro_score(net, precision = 'Freebie'):
input = torch.randn(1, 3, 32, 32).to(net.device)
addflops, multflops, params = count(net, inputs=(input, ))
#use fp-16bit
if precision == 'Freebie':
multflops = multflops / 2
params = params / 2
#add bit-mask
params += bitmask(net)
score = (params/36500000 + (addflops + multflops)/10490000000)
#print('Non zero ratio: {}'.format(non_zero_ratio))
print('Score: {}, flops: {}, params: {}'.format(score, addflops + multflops, params))
return score
Through bit mask, score function is changed like above.
The new score is 0.0054.
Looks good! Thanks for the fix! Two quick questions:
Thanks! Trevor
Also, what name would you like your entries posted under when the results are revealed?
Trevor
Thanks for reply.
First, if ver2 network also could be accepted, we want to submit. But, ver1 has better score. If only one of ver1 and ver2 needs to be submitted, we will submit ver1. If not, we want to submit both.
Second, expansion isn’t important. Sorry for confusion.
You mean the team name? Our team name is ‘KAIST AI’ We prefer this name when the results are revealed.
You can certainly submit both! Sounds good. Thanks again!
Trevor
On Thu, 31 Oct 2019 at 10:47, Kthyeon notifications@github.com wrote:
Thanks for reply.
First, if ver2 network also could be accepted, we want to submit. But, ver1 has better score. If only one of ver1 and ver2 needs to be submitted, we will submit ver1. If not, we want to submit both.
Second, expansion isn’t important. Sorry for confusion.
You mean the team name? Our team name is ‘KAIST AI’ We prefer this name when the results are revealed.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Kthyeon/micronet_neurips_challenge/issues/1?email_source=notifications&email_token=AMILA65H43CH74NI4DN7FOLQRMKZXA5CNFSM4JHCJKOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECYVHFY#issuecomment-548492183, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMILA66SCRWIKUEZLWWAQCLQRMKZXANCNFSM4JHCJKOA .
If you don't mind me, can you give an approximate current ranking of cifar100?
Taehyeon Kim
I can't reveal the results just yet, but I can tell you that we are planning to wrap up the scoring process tomorrow and will be releasing the results early next week.
Trevor
On Thu, 31 Oct 2019 at 16:26, Kthyeon notifications@github.com wrote:
If you don't mind me, can you give an approximate current ranking of cifar100?
Taehyeon Kim
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Kthyeon/micronet_neurips_challenge/issues/1?email_source=notifications&email_token=AMILA63UXZ2EOUMRGKHOJRDQRNSRFA5CNFSM4JHCJKOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECZRNUY#issuecomment-548607699, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMILA65ZWLNJPBPDUMJQPVDQRNSRFANCNFSM4JHCJKOA .
Hi! Thanks for the updates!
Our only outstanding question is about the counting of the mask overhead for sparse weight matrices (1-bit per parameter, including zero valued parameters). Unless I'm missing something, it doesn't look like this is taken into account in your counting script.
Thanks! Trevor