Open AutumnSun1996 opened 2 years ago
Hmm, not sure. It does look like something with Stanza though. Do you know what type of inputs cause the issue? Also you can run with export DEBUG=1
to see the sentence boundary detection output.
Did anyone ever figure out a solution to this? I'm running into the same issue with vi->en? @AutumnSun1996
Could you run it with export DEBUG=1
and post the output ? @yonilevineafs
I reproduced this, it looks like it's an issue with Vietnamese sentence boundary detection.
The root cause could be an issue with Stanza or the Stanza model was mispackaged somehow.
File "/home/argosopentech/git/translate/env/lib/python3.8/site-packages/argostranslategui/gui.py", line 39, in run
translated_text = self.translation_function()
File "/home/argosopentech/git/translate/argostranslate/translate.py", line 52, in translate
return self.hypotheses(input_text, num_hypotheses=1)[0].value
File "/home/argosopentech/git/translate/argostranslate/translate.py", line 274, in hypotheses
translated_paragraph = self.underlying.hypotheses(
File "/home/argosopentech/git/translate/argostranslate/translate.py", line 159, in hypotheses
apply_packaged_translation(
File "/home/argosopentech/git/translate/argostranslate/translate.py", line 388, in apply_packaged_translation
stanza_sbd = stanza_pipeline(input_text)
File "/home/argosopentech/git/translate/env/lib/python3.8/site-packages/stanza/pipeline/core.py", line 166, in __call__
doc = self.process(doc)
File "/home/argosopentech/git/translate/env/lib/python3.8/site-packages/stanza/pipeline/core.py", line 160, in process
doc = self.processors[processor_name].process(doc)
File "/home/argosopentech/git/translate/env/lib/python3.8/site-packages/stanza/pipeline/tokenize_processor.py", line 85, in process
_, _, _, document = output_predictions(None, self.trainer, batches, self.vocab, None,
File "/home/argosopentech/git/translate/env/lib/python3.8/site-packages/stanza/models/tokenize/utils.py", line 163, in output_predictions
st0 = text.index(part, char_offset) - char_offset
ValueError: substring not found
Aborted
A simple example:
output:
The bug only occurs for vi->en, thus should be related to the model used by stanza.