Training params for smaller models

ag1988 commented 3 years ago

Hi, thank you for sharing your wonderful work and models. I am trying to reproduce your training from T5 using the preprocessed datasets that you've provided. In the paper you say you finetuned t5-11b using batch size 8 and 100k steps. I dont have the compute for this so I am only finetuning t5-large. I have the following questions:

For unifiedqa-t5-large what training params did you use, in particular batch size and number of optimizer steps?
I am training using | arc_easy | arc_hard | boolq | mctest_corrected_the_separator | narrativeqa | openbookqa | race_string | squad1_1 | squad2. Is this correct or did you train on some others datasets. I just want to be as close to your training as possible.
How do you weight the samples of these datasets as they have different number of samples? Did you just combine it into one training set or while batching you normalize their probability by the dataset size? (EDIT: I just saw in your paper you do weight by the dataset size - so this is already clear)

Thank you, Ankit

danyaljj commented 3 years ago

Hi there 👋

train_batch_size=64 and 100k steps

Here is the mixture I used:

    "narrativeqa",
    "ai2_science_middle", "ai2_science_elementary",
    "arc_hard", "arc_easy",
    "mctest_corrected_the_separator",
    "squad1_1", "squad2",
    "boolq",
    "race_string",
    "openbookqa",

To be more clear about the technical details of our approach: we used T5's mixture (with uniform weights) which makes it easy to train on combinations of multiple datasets.

ag1988 commented 3 years ago

This was very helpful! Thank you so much for the prompt response!

Regards, Ankit

On Sun, Mar 28, 2021 at 3:53 PM Daniel Khashabi @.***> wrote:

Closed #15 https://github.com/allenai/unifiedqa/issues/15.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/allenai/unifiedqa/issues/15#event-4518484089, or unsubscribe https://github.com/notifications/unsubscribe-auth/AID3IGDJVCN5PAT5AFWLQS3TF6CM7ANCNFSM4Z5HCF2A .

ag1988 commented 3 years ago

Hey Daniel, btw did you also use batch size 64 for the base and small models?

danyaljj commented 3 years ago

The middle numbers here are the batch sizes:

# model_parallelism, train_batch_size, keep_checkpoint_max for different model sizes.
HYPERPARAMETER_DICT = {
    "small": (1, 256, 40),
    "base": (2, 128, 28),
    "large": (8, 64, 24),
    "3B": (8, 16, 30),
    "11B": (8, 8, 30)
}

ag1988 commented 3 years ago

Thanks again Daniel :-)

allenai / unifiedqa

Training params for smaller models #15