Broken prefix bias decoding when enable disable_unk after 2.15.0

Zenglinxiao commented 2 years ago

Hi @guillaumekln, I recently encountered an issue after upgrading ctranslate2 to the latest version. The beam search with prefix bias no long work with --disable_unk. After some debugging and testing, I can confirm this bug comes from release 2.15.0, maybe specifically from #764.

To Reproduce

Take fairseq en-de WMT16 model and convert it to ctranslate2 format based on the doc here

Before the 2.15.0 release(tested with 2.14.0):

In [1] import ctranslate2
In [2] translator = ctranslate2.Translator("en-de.wmt16/model_ct2/", device="cpu", compute_type="default")
In [3] text="Le@@ ad researchers say this may bring early detection of cancer , tu@@ ber@@ cu@@ lo@@ sis , H
   ...: IV and mal@@ aria to patients in low-@@ income countries , where the survival rates for ill@@ nesses 
   ...: such as breast cancer can be half those of ri@@ cher countries ."

In [4] translator.translate_batch([['Le@@', 'ad', 'researchers', 'say', 'this', 'may']], target_prefix=[['Führ@@', 'ende', 'Forscher', 'sagen', 'dies']], prefix_bias_beta=0.2)
Out[4] [TranslationResult(hypotheses=[['Führ@@', 'ende', 'Forscher', 'sagen', 'dies', 'kann']], scores=[], attention=[])]
In [5] translator.translate_batch([['Le@@', 'ad', 'researchers', 'say', 'this', 'may']], target_prefix=[['Führ@@', 'ende', 'Forscher', 'sagen', 'dies']], prefix_bias_beta=0.2, disable_unk=True)
Out[5] [TranslationResult(hypotheses=[['Führ@@', 'ende', 'Forscher', 'sagen', 'dies', 'kann']], scores=[], attention=[])]

From 2.15.0 onwards(validated with 2.15.0 and 2.21.1)

In [4] translator.translate_batch([['Le@@', 'ad', 'researchers', 'say', 'this', 'may']], target_prefix=[['Führ@@', 'ende', 'Forscher', 'sagen', 'dies']], prefix_bias_beta=0.2)
Out[4] [TranslationResult(hypotheses=[['Führ@@', 'ende', 'Forscher', 'sagen', 'dies', 'kann']], scores=[], attention=[])]
In [5] translator.translate_batch([['Le@@', 'ad', 'researchers', 'say', 'this', 'may']], target_prefix=[['Führ@@', 'ende', 'Forscher', 'sagen', 'dies']], prefix_bias_beta=0.2, disable_unk=True)
Out[5] [TranslationResult(hypotheses=[['dam@@', 'alige', 'Spitzen@@', 'for@@', 'scher', 'sagen']], scores=[], attention=[])]

As you can see, once disable_unk is enabled, prefix bias decoding prefix_bias_beta providing target_prefix no longer works properly after 2.15.0, while it worked perfectly with 2.14.0.

Any idea?

guillaumekln commented 2 years ago

Hi,

Thank you for reporting and locating the issue. This is indeed a bug introduced by the PR you referenced.

I will look how to best fix this issue.

Zenglinxiao commented 2 years ago

Thanks for the fix! Having verified with some examples, it can reproduce the same result as pre 2.15. Any chance for a patch release including this fix?

guillaumekln commented 2 years ago

There will be a new version by the end of the week.

OpenNMT / CTranslate2

Broken prefix bias decoding when enable disable_unk after 2.15.0 #897

To Reproduce