Closed keighrim closed 4 months ago
In some output MMIF, I see each Token annotation has word property value that starts with a whitespace. We need to first investigate whether this is a whisper (upstream) bug, and make adjustments accordingly.
Token
word
for example, in aapb-evaluations/asr_eval/preds@whisper-wrapper-tiny@aapb-collaboration-21/cpb-aacip-507-zw18k75z4h.whisper-tiny.mmif
aapb-evaluations/asr_eval/preds@whisper-wrapper-tiny@aapb-collaboration-21/cpb-aacip-507-zw18k75z4h.whisper-tiny.mmif
{ "@type": "http://vocab.lappsgrid.org/Token", "properties": { "word": " Funding", # "start": 0, "end": 8, "document": "v_0:td_1", "id": "to_1" } }, { "@type": "http://mmif.clams.ai/vocabulary/TimeFrame/v1", "properties": { "frameType": "speech", "start": 39.58, "end": 40.26, "id": "tf_1" } }, { "@type": "http://mmif.clams.ai/vocabulary/Alignment/v1", "properties": { "source": "tf_1", "target": "to_1", "id": "al_2" } }, { "@type": "http://vocab.lappsgrid.org/Token", "properties": { "word": " for", # "start": 9, "end": 13, "document": "v_0:td_1", "id": "to_2" } }, ...
No response
This problem is already solved in v6.
fixed in 80d808d991255d20f7c1c2b9aab6f3a506c869e0 (v4) , closing the issue.
Bug Description
In some output MMIF, I see each
Token
annotation hasword
property value that starts with a whitespace. We need to first investigate whether this is a whisper (upstream) bug, and make adjustments accordingly.Reproduction steps
for example, in
aapb-evaluations/asr_eval/preds@whisper-wrapper-tiny@aapb-collaboration-21/cpb-aacip-507-zw18k75z4h.whisper-tiny.mmif
Expected behavior
No response
Log output
No response
Screenshots
No response
Additional context
No response