HazyResearch / metal

Snorkel MeTaL: A framework for training models with multi-task weak supervision
Apache License 2.0
423 stars 79 forks source link

MMTL training accuracies swinging between two datasets #191

Closed Peter-Devine closed 5 years ago

Peter-Devine commented 5 years ago

I am currently using Snorkel MeTaL on two similar datasets with a shared BERT input layer and different linear heads. These two tasks are classification tasks that converge quickly using vanilla BERT by themselves. Upon examining accuracy across epochs, I can see that training and dev accuracy violently swings between being very good for dataset 1 and bad for dataset 2 to the opposite. It does not seem to be converging to some "happy medium" at present.

Any ideas/advice on how you overcame this in your RTE tests?

Thanks

bhancock8 commented 5 years ago

Hmm, hard to say without more information. It's possible that your batches are not randomly shuffled somehow (by default they should be). But if your model was seeing all batches of one task followed by all of the other, that could explain the oscillating effect you described. Have you also tried lowering the learning rate to see if that calms things down?

Peter-Devine commented 5 years ago

So I managed to fix this problem by changing the optimiser config to this:

optimizer_config = {
        "optimizer": "sgd",
        "optimizer_common": {"lr": 0.005},
        "sgd_config": {"momentum": 0.01},
    }

Which has meant that my datasets' dev accuracies both peak within a few epochs at a level that I would expect. I am currently under the assumption that this was a problem with my datasets, instead of with the implementation of the default Adam optimiser. I would recommend this fix to anyone else with this problem working on obscure datasets as it has worked well for me in this case.

bhancock8 commented 5 years ago

Thanks for sharing! Glad you got it working for your problem.