ZHAOZHIHAO / ClusterRouting

Pytorch implementation for the paper of "Capsule networks with non-iterative cluster routing".
MIT License
6 stars 3 forks source link

Number of parameters #3

Open kdhasi opened 3 weeks ago

kdhasi commented 3 weeks ago

Hey,

I have been testing this model on low-resolution image data and the classification accuracy is impressive!

A quick question on the parameter size of the network, since measuring the number of parameters using the gradient flow for the M-variant4 (C4K8D24) gives a result of 3.83M (detailed figure attached), which is different to the recorded value in the manuscript, 2.89M.

If its fine, can I know how the parameter count of 2.89M was obtained? Thank you.

image
ZHAOZHIHAO commented 3 weeks ago

Do you get the same number of parameters for other settings?

kdhasi commented 3 weeks ago

These are the parameters I get for the rest of the M-variants from this method,

  1. M-variant1 (C4K5D6): 179k
  2. M-variant2 (C4K5D8): 302k
  3. M-variant3 (C4K8D16): 1.74M

I have tested using a custom function and other torch libraries like torchinfo[summary] but these methods give out similar values for the parameters.

Appreciate the quick response.

ZHAOZHIHAO commented 3 weeks ago

Not sure if any setting is something or if it has anything to do with the change you made in the pull request. Could you try this parameter counting approach sum(p.numel() for p in model.parameters() if p.requires_grad) ?

kdhasi commented 3 weeks ago

This is the custom function I mentioned before which I used to count the parameters,

def count_parameters(model):
    table = PrettyTable(["Modules", "Parameters"])
    total_params = 0
    for name, parameter in model.named_parameters():
        if not parameter.requires_grad:
            continue
        params = parameter.numel()
        table.add_row([name, params])
        total_params += params
    print(table)
    print(f"Total Trainable Params: {total_params}")
    return total_params

I have also used the line you have provided,

def count_parameters(model):
    total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    print(f"Total Trainable Params: {total_params}")
    return total_params

Both these methods return the same count as before, 3.83M for the M-variant 4.

The pull request was to reduce some excess parameters, if not the last capsule layer will have some extra convolutional layers that are not used by the model. Actually, without this change, the original code will return 6.33M parameters.

ZHAOZHIHAO commented 3 weeks ago

If so, I may I made some undesired changes when I clean my messy code to this neat one for the release purpose. I'm trying to find the old codes now, but not sure if I can locate it. If you change the last layer's capsule dimension to half long, it will roughly meet the same number as reported in the paper and still give a good accuracy?