Expected prototype learning behaviour

Tom-Lotze / FACT

Repo for project for Fairness, Accountability, Confidentialy and Transparency for master AI (Jan 2020)

0 stars 0 forks source link

Expected prototype learning behaviour #4

Closed Tom-Lotze closed 4 years ago

Tom-Lotze commented 4 years ago

It seems that our decoder is learning very slowly. The classification accuracy is at 95% after 1 epoch already (also on validation and test set). However, when trying to visualize the prototypes after 15 epochs (which takes about 30 mins), we get results like the image below. Meanwhile our losses are around 6000, and the reconstruction loss contributes by far the most to the total loss (and does decrease minimally over time). Is this expected and should we train for the full 1500 epochs or should we look for a bug somewhere?

thanks!

prototype1_14

berendjansen commented 4 years ago

Example of the losses:

After one epoch, the validation loss is 7184.705 which consists of:

Classification error: 0.7878
AE (reconstruction) error: 7167.2046
R1: 1.1927
R2: 1.5818

After the second epoch, validation loss is 7175.844 with:

Classification error: 0.3272
AE_error: 7167.19
R1 error: 0.7644
R2 error: 1.3421

We observe similar behaviour in the epochs that follow, the AE error decreases by values ranging from 0.01 - 0.05, while the total AE error is still ±7000.

phlippe commented 4 years ago

I would expect that the model would focus on the reconstruction as it is 7000 times higher than any other loss. I take a look at the code and see if I can spot any bug

berendjansen commented 4 years ago

We just updated the master branch, the relevant code is in the notebook file. Thanks in advance!

phlippe commented 4 years ago

Two things I noticed so far but did not fully resolve the problem: the output of the encoder is a ReLU, and apparently in their implementation as well. But in the paper, they state that they use sigmoid to limit the distances and have the initialization of the prototypes accordingly. The second aspect is regarding the reconstruction loss. For MNIST, you could use BCE instead of MSE as every output is a probability of the pixel. Still, in their code, they also use MSE. I'll continue looking if I see anything other obvious.

phlippe commented 4 years ago

I found the error, it was in the data loading. Your extracting of the images as TensorDataset scales the images between 0 and 255, but you want the data to be between 0 and 1. If you divide "x" after loading by 255, it seems to work fine

Tom-Lotze commented 4 years ago

Thanks a lot for your effort! On 11 Jan 2020, 18:03 +0100, Phillip Lippe notifications@github.com, wrote:

I found the error, it was in the data loading. Your extracting of the images as TensorDataset scales the images between 0 and 255, but you want the data to be between 0 and 1. If you divide "x" after loading by 255, it seems to work fine — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

phlippe commented 4 years ago

I shortly ran your code on a GPU with the changes, and after a few epochs you seem to get reasonable prototypes. I assume longer training will give you nice results as in the paper. download

berendjansen commented 4 years ago

Thanks!