TheStageAI / TorchIntegral

Integral Neural Networks in PyTorch
Apache License 2.0
121 stars 9 forks source link

A question about table1 in the paper #33

Open Iayerlen opened 9 months ago

Iayerlen commented 9 months ago

In the example(/sr/edsr. py) you provided, before converting to INN, the pixel shuffle structure of edsr was replaced with torch. nn.Upsample, meaning that the structure of the INN model and DNN model are not exactly the same. May I ask if in the EDSR4x experiment in table1 (b) of the paper, the Discrete and INN structures used for comparison are exactly the same?

b1n0 commented 9 months ago

Of course, we have replaced pixel shuffle with upsample and trained discrete model for comparison and only after that converted the model to integral. In fact, this replacement of pixel shuffle is optional and one can compare models with out it, in this case input channels of pixel shuffle should not be continuous (prunable).

Iayerlen commented 9 months ago

Thank you for your reply. I would like to ask another question: In Table1, in the classification task, inn-init can achieve better results than discrete, but in the super-resolution task, inn-init cannot exceed discrete. Could you explain why this is the case? Can adding "grid_tuning" or changing fixed partition to trainable partition make inn-init exceed DNN?

b1n0 commented 9 months ago

This is a good question, we also noticed that in some classification problems, the quality of integral models is higher than that of discrete models. This question has not been fully investigated. Maybe the fact is that the models from torchvision are not very well trained, since additional training of pre-trained models gives the same quality as the integral model. Nevertheless, in tasks where networks are easily overfitted for the train dataset, it is possible that integral networks will give an increase in quality, because they can be considered as a method of regularization or ensembling.

Iayerlen commented 9 months ago

You make a lot of sense, thank you for your answer. But I haven't fully understood yet, please allow me to ask another question. According to your statement, in the experiment of edsr4x, the inn-init did not exceed discrete, which is expected, right? That is to say, if the discrete model is fully trained, its performance after being converted to INN should not be better than DNN, right?