chewing / libchewing

libchewing - The intelligent phonetic input method library
https://chewing.im/
GNU Lesser General Public License v2.1
359 stars 90 forks source link

Validate tsi.src and phone.cin #214

Open kcwu opened 8 years ago

kcwu commented 8 years ago

tsi.src and phone.cin were often broken in the past. Not only sorting order, sometimes the syntax is bad (missing frequency, extra space, illegal bopomofo, etc.)

We should validate them in CI to keep them in good state.

Before somebody write the validation code, checking the sorting order seems a good start. cc @PeterDaveHello

czchen commented 8 years ago

I think we already has some checks implemented in https://github.com/chewing/libchewing/blob/master/src/tools/init_database.c for phone.cin and tsi.src. Not sure if any check is missing for these two.

kcwu commented 8 years ago

init_database.c is tolerance to errors and more robust. For example,

I'd like to have stricter validator.

czchen commented 8 years ago

@kcwu, do you think we can just use a stricter parser in init_database.c, or we really need a separate validator?

kcwu commented 8 years ago

These two definitely should be rejected by init_database.c

For blank line and extra spaces (and sorting order), I'm not sure should we enforce or not.

Billy4195 commented 7 years ago

I can't understand the above descriptions of the situation that init_database.c should avoid. The first feature illegal bopomofo sequence means the sequence only contains ˋ ˊ ˇ?? The second feature where are the negative numbers and the non-decimal numbers??