Open George0828Zhang opened 3 years ago
can you reproduce the result of this example (mustc en-de ST)
can you reproduce the result of this example (mustc en-de ST)
The example in the link uses the default SpeechToText
, which sets source_dictionary=None
, so this error would not occur. I specifically mentioned that this error only occurs when source_dictionary!=None
.
Recently, I have run ST and ASR in mustc en-de, but I got a terrible result. Have you met the problem? here is my issue in #3897.
π Bug
When using SpeechTextJointToTextTask (or when adding
source_dictionary
toSpeechToTextTask
) as the task,fairseq-generate
andfairseq-interactive
both fails to produce the result. This is due to the following lines in generate.py and interactive.py:This line assumed
src_tokens
are tokens (instead of features) solely based on whethertask.source_dictionary
is None, which is inappropriate. Instead, it should be based on the type or dimensions ofsrc_tokens
, i.e. for speech,src_tokens
should be float tensor with 3 dimensions, while for text it should be long tensor with 2 dimensions.To Reproduce
Steps to reproduce the behavior (always include the command you ran):
SpeechTextJointToTextTask
as task (or by inheritingSpeechToTextTask
and addsource_dictionary
)fairseq-generate
orfairseq-interactive
Code sample
Expected behavior
fairseq-generate
orfairseq-interactive
should be able to complete, even iftask.source_dictionary
is None. Whether to decodesrc_tokens
using dictionary should be based on the type or dimensions ofsrc_tokens
, i.e. for speech,src_tokens
should be float tensor with 3 dimensions, while for text it should be long tensor with 2 dimensions.Environment
pip
, source):Additional context