Closed Fred-Git-Hub closed 3 years ago
Thank you for the report and fix, great that you included a reproducible test case :-) It looks good to me, but I think someone who knows more about the tagger should take a look too.
@unhammer, I think that's basically only @sanmarf and @jimregan. I'd say it looks like a pretty applyable fix.
LGTM
Is still still relevant and should be applied?
I expected the core development team to review this part of code and fix a bug at an appropriate release timing.
Hi,
Before going ahead, install apertium-tagger-training-tools and use apertium-tagger-readwords to make sure it works.
Regards Felipe
@TinoDidriksen the test case is still reproducible, and the patch still changes it in the way we want. Jim said it looked good, so applied.
https://github.com/apertium/lttoolbox/blob/0285babcb7ad1bb86c9b7d88c7b1db90de96b6c9/lttoolbox/pattern_list.cc#L127
result.push_back(int((unsigned char) lemma[i]));
should beresult.push_back(int((wchar_t) lemma[i]));
Otherwise, Unicode lemma of tags-item in TSX file will not work.
[Test case] unicode.tsx
In case of unsigned char:
In case of wchar_t:
Same is true for apertium-tagger.
This issue is copied from https://github.com/apertium/lttoolbox/commit/76287d2f2e495d626be7200ea85f7dd712adbb84#commitcomment-36208453