The KL Divergence loss seems to be 0 for the MNIST one vs all classifier for all timesteps. It used to be a bit higher than 0 when we used 10 classes instead of 2. It might just be caused by the high accuracy of the classification model, however it seems strange that it is 0 for all timesteps.
The KL Divergence loss seems to be 0 for the MNIST one vs all classifier for all timesteps. It used to be a bit higher than 0 when we used 10 classes instead of 2. It might just be caused by the high accuracy of the classification model, however it seems strange that it is 0 for all timesteps.