Closed xcluo closed 2 years ago
@rlebras
Sequential training did not hurt performance in any of our experiments. However, while our tasks covered a broad range of domains, they were all multiple choice.
In the extreme, sequential training for long enough on totally random data seems like it would hurt performance. Based on the Unicorn experiments, I expect that sequential training will not hurt performance in most practical situations--but as an empirical finding, it's always possible there are important scenarios our experiments didn't cover.
Phenomenon: from the cost equivalent curve of Figure 2, sequential training uniformly outperform single task (e.g., target task is winogrande), but in my reproducing, the single task baseline always outperforms all sequential training (sequential training update steps vary from 5k to 50k, and interval is 5k), which violates the UINCORN Table 1 conclusion. Experiment Setting:
Question: