Pretrain+Finetune results?

The Adda results are indeed impressive. But I am wondering how it compares to: 1) train on MNIST, fine-tune on the small USPS dataset 2) mixes MNIST and small USPS dataset, and trained on the mixed dataset.

I tried 1) and 2) on some document classification (NLP) task, I found both 1) and 2) worked very well, i.e., improving target classification results from 0.74 to 0.87. Thus how does Adda compare to 1) and 2)?