I noticed that in the loss, by default, you sum the latent dimension in the l1 loss, but take the mean of the features dimension (in the reconstruction loss). This seems like it would make the optimal l1_coeffecient vary with the model size. Might be worth considering taking the mean of both those dimensions, that way the optimal l1_coeff should be more stable between autoencoder configurations.
I noticed that in the loss, by default, you sum the latent dimension in the l1 loss, but take the mean of the features dimension (in the reconstruction loss). This seems like it would make the optimal l1_coeffecient vary with the model size. Might be worth considering taking the mean of both those dimensions, that way the optimal l1_coeff should be more stable between autoencoder configurations.