facebookresearch / ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
https://parl.ai
MIT License
10.48k stars 2.09k forks source link

Building dictionary takes too long when trying to further train a custom model #3875

Closed anelkhvan closed 3 years ago

anelkhvan commented 3 years ago

Hello! I have fine-tuned the original Blender 90M further on the four original tasks in Colab: `from parlai.scripts.train_model import TrainModel TrainModel.main(

similar to before

multitask_weights = [2, 6, 3, 3],
task='blended_skill_talk,convai2:normalized,empathetic_dialogues, wizard_of_wikipedia', 
model='transformer/generator',
model_file='finetuned/model',

# initialize with a pretrained model
init_model='zoo:tutorial_transformer_generator/model',
#init_model='finetuned/model',

# arguments we get from the pretrained model.
# Unfortunately, these must be looked up separately for each model.
n_heads=16, n_layers=8, n_positions=512, text_truncate=512,
label_truncate=128, ffn_size=2048, embedding_size=512,
activation='gelu', variant='xlm',
dict_lower=True, dict_tokenizer='bpe',
dict_file='zoo:tutorial_transformer_generator/model.dict',
#dict_file = 'finetuned/model.dict',
learn_positional_embeddings=True,

# some training arguments, specific to this fine-tuning
# use a small learning rate with ADAM optimizer
lr=1e-5, optimizer='adam',
warmup_updates=100,
# early stopping on perplexity
validation_metric='ppl',
# train at most steps, and validate every 0.25 epochs
max_train_steps = 72000, validation_every_n_epochs=0.25,

batchsize=8, fp16=True, fp16_impl='mem_efficient',

# speeds up validation
skip_generation=True,

# helps us cram more examples into our gpu at a time
dynamic_batching='full',

)`

I have saved the model, and wanted to train it further using this script: `from parlai.scripts.train_model import TrainModel TrainModel.main(

similar to before

multitask_weights = [2, 6, 3, 3],
task='blended_skill_talk,convai2:normalized,empathetic_dialogues, wizard_of_wikipedia', 
model='transformer/generator',
model_file=path + model_file,

# initialize with a pretrained model
init_model=path + model_file,   
# arguments we get from the pretrained model.
# Unfortunately, these must be looked up separately for each model.
n_heads=16, n_layers=8, n_positions=512, text_truncate=512,
label_truncate=128, ffn_size=2048, embedding_size=512,
activation='gelu', variant='xlm',
dict_lower=True, dict_tokenizer='bpe',
dict_file=path + dict_file,
#dict_file = 'finetuned/model.dict',
learn_positional_embeddings=True,

# some training arguments, specific to this fine-tuning
# use a small learning rate with ADAM optimizer
lr=1e-5, optimizer='adam',
warmup_updates=100,
# early stopping on perplexity
validation_metric='ppl',
# train at most steps, and validate every 0.25 epochs
max_train_steps = 72000, validation_every_n_epochs=0.25,

# depend on your gpu. If you have a V100, this is good (this is taken from ParlAI tutorial, they say batch size  12 is good for V100)
batchsize=8, fp16=True, fp16_impl='mem_efficient',

# speeds up validation
skip_generation=True,

# helps us cram more examples into our gpu at a time
dynamic_batching='full',

)` model_file and model_dict are initialized with the fine-tuned model file and fine-tuned model dict, respectively. However, when I try to run this code, I get the following: 10:42:35 | loading fbdialog data: /usr/local/lib/python3.7/dist-packages/data/ConvAI2/train_self_original.txt Building dictionary: 0%| | 0.00/297k [00:00<?, ?ex/s]10:43:09 | loading normalized fbdialog data: /usr/local/lib/python3.7/dist-packages/data/ConvAI2/train_self_original.txt 10:43:09 | loading fbdialog data: /usr/local/lib/python3.7/dist-packages/data/ConvAI2/train_self_original.txt 10:43:09 | parlai.tasks.wizard_of_wikipedia.agents.DefaultTeacher' is outputting dicts instead of messages. If this is a teacher that is part of ParlAI, please file an issue on GitHub. If it is your own teacher, please return a Message object instead. Building dictionary: 142Mex [3:29:43, 11.1kex/s]

In short, it took me three hours of execution, and building dictionary hasn't progressed at all. Are there any arguments I should have specified to speed up the process?

stephenroller commented 3 years ago

Hmm, I don't think the dictionary should be built at all if it's from a pretrained model. Can you try giving the dict_file argument directly, and use the original dict file? (Perhaps you moved the model file but didn't move the dict file with it?)

anelkhvan commented 3 years ago

Thank you! I specified the dict_file directly with the original dict file and it worked! `from parlai.scripts.train_model import TrainModel TrainModel.main(

similar to before

multitask_weights = [2, 6, 3, 3],
task='blended_skill_talk,convai2:normalized,empathetic_dialogues, wizard_of_wikipedia', 
model='transformer/generator',
model_file=path + '/' + model_file,

# initialize with a pretrained model
#init_model=path + model_file,   
# arguments we get from the pretrained model.
# Unfortunately, these must be looked up separately for each model.
n_heads=16, n_layers=8, n_positions=512, text_truncate=512,
label_truncate=128, ffn_size=2048, embedding_size=512,
activation='gelu', variant='xlm',
dict_file='zoo:tutorial_transformer_generator/model.dict',
learn_positional_embeddings=True,

# some training arguments, specific to this fine-tuning
# use a small learning rate with ADAM optimizer
lr=1e-5, optimizer='adam',
warmup_updates=-1,
# early stopping on perplexity
validation_metric='ppl',
# train at most steps, and validate every 0.25 epochs
max_train_steps = 72000, validation_every_n_epochs=0.25,

batchsize=8, fp16=True, fp16_impl='mem_efficient',

# speeds up validation
skip_generation=True,

# helps us cram more examples into our gpu at a time
dynamic_batching='full',

)`