Open RKoopal opened 5 months ago
It would be nice to know whether 'encoder-decoder' architectures are supported as I also want to use LMCocktail for my research 😃
@RKoopal your implementation looks nice, curious to hear back from the maintainers!
@RKoopal , thanks for your suggestion!
MT5 uses the pad_token_id as the starting token for decoder_input_ids generation, but the preprocess function you used doesn't add a special token at the begining of the decoder_input_ids. I recommend using the official function: https://huggingface.co/docs/transformers/model_doc/mt5#transformers.MT5ForConditionalGeneration.example
inputs = tokenizer(input_texts, text_target=output_texts, return_tensors="pt")
outputs = model(**inputs)
loss = outputs.loss
You can use this function:tokenizer(input_texts, text_target=output_texts, return_tensors="pt")
, which is very simple.
Welcome to submit PR~
@staoxiao Thank you for your reply! I have implemented your suggestions and will make a PR shortly.
@Nacho888 I'll link the PR here in case you're interested.
@Nacho888 @staoxiao Created PR: https://github.com/FlagOpen/FlagEmbedding/pull/761
I have been LM-Cocktail for merging Language models, specifically the 'mix_models_with_data' function. However, I noticed there are only implementations for encoder or decoder models, not encoder-decoder.
Maybe it'd be nice to consider adding this functionality to the repo. My own implementation is below, let me know what you think.
The merging was done using two finetuned versions of mT0-small.
model = mix_models_with_data( model_names_or_paths=[model1, model2], model_type='encoder-decoder', example_ata=examples, temperature=5.0, max_input_length=512, neg_number=2, output_path="output/xnli-ar_de-datamix")
Updated 'load_model':
'load_seq2seq_model':
Updated 'compute_weights':
'seq2seq_loss':
'preprocess_data_for_seq2seq':
Alternatives
I noticed the decoder preprocessing and loss functions ('preprocess_data_for_llm', 'llm_loss') also work for mT0. However, due to the loadmodel function it does not allow you to specify decoder and use the functions directly. Using the decoder function also gave different results for my experiment: weight for each model: output/xnli/experiment_mt0-small_xnli_ar 0.48479509353637695 output/xnli/experiment_mt0-small_xnli_de 0.5152048468589783 Saving the new model to output/xnli-arde-datamix
My implementation for the preprocess function follows a different style than the decoder function. Therefore I also rewrote it as follows:
NOTE: This implementation gives the same results as for using the llm_preprocessor. So maybe I made a mistake in one of the two functions. At the moment I cannot say which one is correct without more experiments.
Let me know what you think or if I missed already existing functionality that makes this obsolete.