Unfortunately the validation loss used to train the model is currently being calculated on the test set, which means that the test set perplexity performance metric is not a reliable indicator of out-of-sample generalisation (cf. main.py lines 256-282).
The original intention to calculate the validation loss on the validation set is clear from main.py lines 244-251, however the variables defined there are not used subsequently in the "evaluate" function.
Unfortunately the validation loss used to train the model is currently being calculated on the test set, which means that the test set perplexity performance metric is not a reliable indicator of out-of-sample generalisation (cf. main.py lines 256-282).
The original intention to calculate the validation loss on the validation set is clear from main.py lines 244-251, however the variables defined there are not used subsequently in the "evaluate" function.