ResNet-50 successful training, poor inference

Specifications: ResNet-50 (with transfer learning) DIGITS-6.1.1 NVCaffe-0.17 CUDA-10.1 cuDNN-7.6 Windows-10

By successful, I mean both accuracy and loss (for train-set and validation-set) had converged at respectable high (++90%) and low (--0.005) ends respectively. However, these performances could not be reflected on the inference, via DIGITS or NVCaffe. The result, most of the time, seems to be biased on a particular class (out of 4 in total). This could be understandable if the class distribution was imbalanced but it is in fact, not. Furthermore, this peculiar behavior is observed even on the train-set itself. From where did the high training accuracy originate if the inference on train-set is exceptionally poor? Tried toggling with the preprocessing steps (enable/disable scaling, mean subtraction, etc.) during inference but to no avail...

NVIDIA / caffe

ResNet-50 successful training, poor inference #576