Closed blankspark closed 2 years ago
Thanks for your attention to our work, the choice of init number (e.g., 20000) does not change the recognition results. As we only use the decoded indexes for predicition. It seems like vocabulary in ctcdecode can help word segmentation (space_id), which does not exist in current CSLR dataset.
Similar issue about the alignment results.
感谢您的回复!我将尝试实现
您的工作非常出色! 在ctcdecode的文档中,vocab要用待解码的字典来初始化,为什么代码实现用chr(20000+(0~1296))就可以实现呢?20000这个数字是特定的吗? 另外,您的论文中图5给出了模型生成标签与ground_truth和视频的对齐效果,但是我通过ctcdecode只能生成标签但无法用于对齐标注,请问这部分工作是需要额外的代码实现吗? 期待您的答复!