During Training

We need to monitor the progress or stability of training right after the end of each epochs. We can consider both quantitative and qualitative evaluations.

Quantitative metric

[ ] ~KL divergence~
- ~(Memory caution?) Need to store MCMC samples for each 'epoch'~
- ~Only can be used for simulation (We do not know true latent distribution for real data)~ KL is meaningless and hard to evaluate in ALMOND
[ ] Reconstruction error
- Efficiency (Train ALMOND after partial train VAE: Train VAE 50epochs and compare VAE and ALMOND; Is ALMOND efficient than VAE?)
- Accuracy (Train ALMOND after fully train VAE: Train VAE more than 200epochs and train ALMOND; Does ALMOND overcome the limitation of ELBO?)

After Training

We need to compare the result depending on the method used to infer latent distribution (Variational Inference, Sampling and Message Passing). Although it is well known fact that sampling algorithm and message passing algorithm shows better result for inferring true latent distribution compared to variational inference, the effect of inferring method on the result is not explicitly studied until now. We can consider both quantitative and qualitative evaluations.

Quantitative metric

[ ] Reconstruction error
- MSE for single cell data
- FID for image data
- Imputation for recommend system
[ ] Clustering result
- Choose and use standard clustering method with latent variables
- Calculate Adjusted Random Index and Adjusted Mutual information with true label (Only when label is available)
[ ] Classification task in latent space.
- AUROC
- AUPRC
- F1-Score

Qualitative metric

[ ] Generated sample quality
[ ] Latent variable visualization

chunhyonho / Almond_torch

Evaluation metric #1

During Training

Quantitative metric

After Training

Quantitative metric

Qualitative metric

Reconstruction error during training

For training set

For validation set