adjidieng / ETM

Topic Modeling in Embedding Spaces
MIT License
549 stars 128 forks source link

Validation set loss is being calculated on the Test set. #26

Open jfcann opened 3 years ago

jfcann commented 3 years ago

Unfortunately the validation loss used to train the model is currently being calculated on the test set, which means that the test set perplexity performance metric is not a reliable indicator of out-of-sample generalisation (cf. main.py lines 256-282).

The original intention to calculate the validation loss on the validation set is clear from main.py lines 244-251, however the variables defined there are not used subsequently in the "evaluate" function.