Thanks for the good work! I am interested in implementing your early-exit-ensembles idea in my neural network and I have some problems understanding diversity loss. I understand that in ensemble learning we want the predictions from each exit to be as diverse as possible. To achieve this the paper proposes minimising CE loss between exit pairs in order to maximise mutual information. However I was confused by this statement, as I thought that by doing so, won't it be encouraging exits to be more similar, hence less diverse instead?
Hi,
Thanks for the good work! I am interested in implementing your early-exit-ensembles idea in my neural network and I have some problems understanding diversity loss. I understand that in ensemble learning we want the predictions from each exit to be as diverse as possible. To achieve this the paper proposes minimising CE loss between exit pairs in order to maximise mutual information. However I was confused by this statement, as I thought that by doing so, won't it be encouraging exits to be more similar, hence less diverse instead?