Currently there is a regular expression which filters out several
lines defining valid phrases. The emoji-table for example has phrases
containing spaces which are currently filtered out and the ipa-x-sampa
table has trailing comments which are filtered out as well.
Therefore, it seems reasonable to change the regular expression checking
for a table line containing a phrase definition to accept every
line which has 3 columns seperated by tabs followed optionally
by more columns also separated by tabs (the optional columns are ignored,
i.e. they are just comments in the table source).
Currently there is a regular expression which filters out several lines defining valid phrases. The emoji-table for example has phrases containing spaces which are currently filtered out and the ipa-x-sampa table has trailing comments which are filtered out as well.
In phrase_parser, the phrases are parsed like:
xingma, phrase, freq = unicode (l, "utf-8").strip ().split ('\t')[:3]
Therefore, it seems reasonable to change the regular expression checking for a table line containing a phrase definition to accept every line which has 3 columns seperated by tabs followed optionally by more columns also separated by tabs (the optional columns are ignored, i.e. they are just comments in the table source).
See: https://bugzilla.redhat.com/show_bug.cgi?id=856903