I tried to use dcrnnsupervise.evaulate() to calculate the test MAE but found that the results will be largely different when using different test batch sizes. Later I found that the current implementation directly calculates the (masked) MAE per batch and then simply averages them. This is not correct since the masking average is not a linear operation and then it cannot be done per batch and then calculate their average. The correct approach should be first to collect all predictions (and the corresponding targets) and then calculate the (masked) MAE over this full batch of data.
I tried to use dcrnnsupervise.evaulate() to calculate the test MAE but found that the results will be largely different when using different test batch sizes. Later I found that the current implementation directly calculates the (masked) MAE per batch and then simply averages them. This is not correct since the masking average is not a linear operation and then it cannot be done per batch and then calculate their average. The correct approach should be first to collect all predictions (and the corresponding targets) and then calculate the (masked) MAE over this full batch of data.