Open qutie75 opened 6 years ago
hi @qutie75 - yes this is a known issue. -segment_numbers
only works with -mode aggressive
(so you can use that for the moment) - we will fix that (or block use of the option in non-aggressive mode because it is more in the spirit of "aggressive" than "conversative" tokenization.
Hello!
I want to ask about -segment_numbers option.
If i put this option when i tokenize, can i check it in my output file?
This is my command,
th tools/tokenize.lua -case_feature true -segment_case true -segment_numbers true -joiner_annotate true < input_test_en.txt > test.tok and the output is like below.
the│C convention│L in│L 1912│N led│L to│L a│L split│L republican│C party│C ■.│N I expected 1912 segmented like 1 9 1 2 but there is no change…
Please help me. Thank you.