a-tsioh / su-lip

輸入臺語ê webservice, coded in OPA (compile into Node.js)
1 stars 2 forks source link

Allow dashless input #3

Open audreyt opened 11 years ago

audreyt commented 11 years ago
$ curl http://su-lip.magistry.fr/_ws_/ -d '{"query":"bokbing"}'
{"fuzzy":[]}

The system should have sufficient information to tokenize the incoming dashless strings into possible segmentations.

For bokbing there are two possible segmentations: bok-bi-ng 莫美秧 (which should match nothing), and bok-bing which matches the usual 莫名*.

a-tsioh commented 11 years ago

Sufficient information but for now, I'm using a PEG parser for the TRS, it can't deal cleanly with ambiguity. It may be able to deal with simple dashless cases (finding bok-bing but not bok-bi-ng).

Another solution could be to replace the PEG with something like CKY but this will need more coding

audreyt commented 11 years ago

I see, it's because that the PEG matcher can't be coaxed into giving ambiguous parses?

I think bokbing => bok-bing is good for now, certainly better than nothing at all. :-)