Open audreyt opened 11 years ago
Sufficient information but for now, I'm using a PEG parser for the TRS, it can't deal cleanly with ambiguity. It may be able to deal with simple dashless cases (finding bok-bing but not bok-bi-ng).
Another solution could be to replace the PEG with something like CKY but this will need more coding
I see, it's because that the PEG matcher can't be coaxed into giving ambiguous parses?
I think bokbing => bok-bing is good for now, certainly better than nothing at all. :-)
The system should have sufficient information to tokenize the incoming dashless strings into possible segmentations.
For
bokbing
there are two possible segmentations: bok-bi-ng 莫美秧 (which should match nothing), and bok-bing which matches the usual 莫名*.