Open koren-v opened 1 month ago
what if you run the same with beam=1
Output will be the same:
Vanilla output: physician assistants sind medizinische Versorgungsträger, die ärztlichen Versorgungskräfte benötigen, um Krankheiten und Krankheiten zu diagnostischen und zu behandeln und zu prescriben Medikamenten.
Ctranslate output: physician assistants sind medizinische Versorgungsträger, die ärztlichen Versorgungskräfte benötigen, um Krankheiten und Krankheiten zu diagnostischen und zu behandeln und zu prescriben Medikamenten.
if you have time maybe you can test CT2 2.24 before these changes: https://github.com/OpenNMT/CTranslate2/blob/39f48f2e843df52245e6c857326e1115bca12b03/CHANGELOG.md?plain=1#L551-L552 and test with/without allow_early_exit and length_penalty
Ok, I needed to use another model as t5 was not supported in ctranslate2==2.24.0
, here is the results I got in my experiments:
Convert model:
ct2-transformers-converter --model "beogradjanka/bart_finetuned_keyphrase_extraction" --output_dir "ct2-bart"
New code snippet:
from itertools import product
import torch
from transformers import BartForConditionalGeneration, AutoTokenizer
import ctranslate2
device = torch.device("cuda")
model_name = "beogradjanka/bart_finetuned_keyphrase_extraction"
hf_model = BartForConditionalGeneration.from_pretrained(model_name).eval().to(device)
tokenizer = AutoTokenizer.from_pretrained(model_name)
fast_model = ctranslate2.Translator("ct2-bart", device="cuda")
text = (
"The core CTranslate2 implementation is framework agnostic. The logic that is specific to each framework is moved "
"to a conversion step that loads supported models into a unified representation. The weights are then optionally "
"quantized and saved into an optimized binary format."
)
def get_out(inp, model, length_penalty=1.0, allow_early_exit=None):
inputs = tokenizer(inp, return_tensors="pt")
ids = model.generate(**inputs.to(device),
num_beams=5,
min_length=0,
max_length=1024,
length_penalty=length_penalty
)
return tokenizer.batch_decode(ids, skip_special_tokens=True)[0]
def get_out_fast(inp, model, length_penalty=1.0, allow_early_exit=False):
source = tokenizer.encode(inp)
source = tokenizer.convert_ids_to_tokens(source)
results = model.translate_batch([source],
beam_size=5,
min_decoding_length=0,
max_decoding_length=1024,
length_penalty=length_penalty,
allow_early_exit=allow_early_exit,
)
target = results[0].hypotheses[0]
return tokenizer.decode(tokenizer.convert_tokens_to_ids(target), skip_special_tokens=True)
for lp, aes in product([1.0, 3.0], [False, True]):
res_vanilla = get_out(text, hf_model, lp, aes)
res_fast = get_out_fast(text, fast_model, lp, aes)
print("Predictions are equal:", res_vanilla == res_fast, f", when length_penalty={lp} and allow_early_exit={aes}")
print("Vanilla output:", res_vanilla)
print("Ctranslate output:", res_fast)
print("========================================")
Output:
Predictions are equal: False , when length_penalty=1.0 and allow_early_exit=False
Vanilla output: ctranslate2, framework agnostic, platform agnostic
Ctranslate output: ctranslate2, framework agnostic, framework agnostic
========================================
Predictions are equal: False , when length_penalty=1.0 and allow_early_exit=True
Vanilla output: ctranslate2, framework agnostic, platform agnostic
Ctranslate output: ctranslate2, framework agnostic, framework agnostic
========================================
Predictions are equal: False , when length_penalty=3.0 and allow_early_exit=False
Vanilla output: ctranslate2, framework agnostic, model validation, model conversion
Ctranslate output: ctranslate2, framework agnostic, framework agnostic, model conversion
========================================
Predictions are equal: False , when length_penalty=3.0 and allow_early_exit=True
Vanilla output: ctranslate2, framework agnostic, model validation, model conversion
Ctranslate output: ctranslate2, ctranslate2, framework agnostic, platform agnostic, framework agnostic
Correct me if I did not understand what you mean by saying "with/without allow_early_exit and length_penalty"
As Guillaume mentioned before, there are often subtle differences in the way beam search between the frameworks. It could make a slight difference, In my opinion, it looks good in 2 cases.
Just to make sure that issue won't be missed, will duplicate my response here:
I faced the same issue. However, when read this response, I thought that if I didn't use any special generation parameters (like
no_repeat_ngram_size
), I would get the same result. Unfortunately, seems that there are other differences in beam search implementation - or do I miss something?To reproduce:
package versions:
transformers==4.34.0
,ctranslate2==3.20.0
(as used here)from transformers import T5ForConditionalGeneration, AutoTokenizer import ctranslate2
device = torch.device("cuda")
model_name = "google/flan-t5-base" hf_model = T5ForConditionalGeneration.from_pretrained(model_name).eval().to(device) tokenizer = AutoTokenizer.from_pretrained(model_name)
fast_model = ctranslate2.Translator("ct2-t5-base", device="cuda")
text = "translate English to German: physician assistants are medical providers who are licensed to diagnose and treat illness and disease and to prescribe medication"
def get_out(inp, model): inputs = tokenizer(inp, return_tensors="pt") ids = model.generate(**inputs.to(device), num_beams=3, min_length=0, max_length=1024, ) return tokenizer.batch_decode(ids, skip_special_tokens=True)[0]
def get_out_fast(inp, model): source = tokenizer.encode(inp) source = tokenizer.convert_ids_to_tokens(source) results = model.translate_batch([source], beam_size=3, min_decoding_length=0, max_decoding_length=1024) target = results[0].hypotheses[0] return tokenizer.decode(tokenizer.convert_tokens_to_ids(target), skip_special_tokens=True)
res_vanilla = get_out(text, hf_model) res_fast = get_out_fast(text, fast_model)
print("Vanilla output:", res_vanilla) print("Ctranslate output:", res_fast)
Vanilla output: physician assistants sind medical providers, die zu Diagnose und Behandlung von Krankheiten und Krankheiten und zu Verknüpfen von Medikamenten zu ermitteln. Ctranslate output: physician assistants sind medical providers, die zu Diagnose und Behandlung von Krankheiten und Krankheiten und zu Verknüpfen von Medikamenten zu kaufen sind.