The Adda results are indeed impressive. But I am wondering how it compares to:
1) train on MNIST, fine-tune on the small USPS dataset
2) mixes MNIST and small USPS dataset, and trained on the mixed dataset.
I tried 1) and 2) on some document classification (NLP) task, I found both 1) and 2) worked very well, i.e., improving target classification results from 0.74 to 0.87. Thus how does Adda compare to 1) and 2)?
The Adda results are indeed impressive. But I am wondering how it compares to: 1) train on MNIST, fine-tune on the small USPS dataset 2) mixes MNIST and small USPS dataset, and trained on the mixed dataset.
I tried 1) and 2) on some document classification (NLP) task, I found both 1) and 2) worked very well, i.e., improving target classification results from 0.74 to 0.87. Thus how does Adda compare to 1) and 2)?