This addresses the issue of the sentence piece model not correcting when two words should be together. For example, if there were two text segments with the first one ending in "with" and the second one beginning with "out", the model would identify it as two different words. However, we want the two to be together as "without", and this would involve correcting the prediction list, delays list, and elapsed list for latency accuracy.
To run the spm_detokenizer_agent.py, use this command in the SimulEval directory:
{"index": 0, "prediction": "Let's do it without hesitation.", "delays": [3, 6, 6, 9, 9], "elapsed": [0, 0, 0, 0, 0],
"prediction_length": 5, "reference": "Let's do it without hesitation.\n", "source": "\u2581Let ' s \u2581do
\u2581it \u2581with out \u2581hesitation .", "source_length": 9}
metrics.tsv
LAAL AL AP DAL
3.3 3.3 0.733 3.96
scores.tsv
BLEU LAAL AL AP DAL ATD
100.0 3.3 3.3 0.733 3.96 5.0
This addresses the issue of the sentence piece model not correcting when two words should be together. For example, if there were two text segments with the first one ending in "with" and the second one beginning with "out", the model would identify it as two different words. However, we want the two to be together as "without", and this would involve correcting the prediction list, delays list, and elapsed list for latency accuracy.
To run the spm_detokenizer_agent.py, use this command in the
SimulEval
directory:This is the expected output for
instances.log
metrics.tsv
scores.tsv