Not able to reproduce results for CNN_PHMM_VAE with simulation data

Here are the steps I take to reproduce the results described in the section "Motif dependent embeddings using simulation data": Running the scripts/multiple.py script with default parameters The following test loss values were observed:

ELBO: 23.95
Reconstruction error: 19.22
KL Divergence: 4.95

To reproduce the plot, I took the following steps (as there is no script available in the repository)

took the file out/seqences.txt resulting from running scripts/multiple.py and
sampled a fasta file from it
The sampled fasta file is used as input for the scripts/encode.py script to create latent embeddings with the model created by scripts/multiple.py.
Using the out/embed.seq and out/sequences.txt files created by scripts/encode.py, the latent embeddings are plotted with their corresponding motif (colour).

This results in the following plot: cnn_phmm_vae

However, if I switch off the force_matching option (see https://github.com/Xilorole/raptgen/blob/c4986ca9fa439b9389916c05829da4ff9c30d6f3/scripts/multiple.py#L84), I observe the following test loss values after running scripts/multiple.py with default parameters:

ELBO: 21.21
Reconstruction error: 17.05
KL Divergence: 4.16

These values are quite close to those reported in the paper (20.60, 16.02, 4.59 respectively). The resulting plot also looks very similar to Fig.2b (HMM profile). cnn_phmm_vae_multiple

I repeated both described experiments (enabled force_matching & disabled force_matching) with different seeds.

Could you clarify which parameters you were using during training of the CNN_PHMM_VAE model?

Xilorole / raptgen

Not able to reproduce results for CNN_PHMM_VAE with simulation data #13