Closed ziyu123 closed 2 years ago
现在如果arpa文件中,有中英混合的情况,输出的score_hyps都是诸如下面的情况: (((-3.4028234663852886e+38, ()), (-3.4028234663852886e+38, (1819, 29)), (-3.4028234663852886e+38, (29,)), (-3.4028234663852886e+38, (1819,)), (-3.4028234663852886e+38, (1819, 29, 2327)), (-3.4028234663852886e+38, (1819, 2327)), (-3.4028234663852886e+38, (29, 2327)), (-3.4028234663852886e+38, (2327,)), (-3.4028234663852886e+38, (1819, 5170)), (-3.4028234663852886e+38, (5170,))), ((-3.4028234663852886e+38, ()), (-3.4028234663852886e+38, (397, 94)), (-3.4028234663852886e+38, (2731, 94)), (-3.4028234663852886e+38, (94,)), (-3.4028234663852886e+38, (397,)), (-3.4028234663852886e+38, (397, 759)), (-3.4028234663852886e+38, (759,)), (-3.4028234663852886e+38, (2731,))))
解码分数都是 NUM_FLT_INF = std::numeric_limits::max() 如果只是中文或者英文,就没有问题。现在是不是不支持中英混合的LM?
如果语言模型的文本是这样的就可以:
你 刚 刚 吃 K F C 了 ?
是的,把英文word用char表示是可以的,如果作为word就不行了
close it !
现在如果arpa文件中,有中英混合的情况,输出的score_hyps都是诸如下面的情况: (((-3.4028234663852886e+38, ()), (-3.4028234663852886e+38, (1819, 29)), (-3.4028234663852886e+38, (29,)), (-3.4028234663852886e+38, (1819,)), (-3.4028234663852886e+38, (1819, 29, 2327)), (-3.4028234663852886e+38, (1819, 2327)), (-3.4028234663852886e+38, (29, 2327)), (-3.4028234663852886e+38, (2327,)), (-3.4028234663852886e+38, (1819, 5170)), (-3.4028234663852886e+38, (5170,))), ((-3.4028234663852886e+38, ()), (-3.4028234663852886e+38, (397, 94)), (-3.4028234663852886e+38, (2731, 94)), (-3.4028234663852886e+38, (94,)), (-3.4028234663852886e+38, (397,)), (-3.4028234663852886e+38, (397, 759)), (-3.4028234663852886e+38, (759,)), (-3.4028234663852886e+38, (2731,))))
解码分数都是 NUM_FLT_INF = std::numeric_limits::max()
如果只是中文或者英文,就没有问题。现在是不是不支持中英混合的LM?