alecristia commented 6 years ago

I admit I haven't run a perfectly matched test, but if anything what I ran should have favored the new package, yet, the performance in the new package is abismal, and way lower than in the old package:

New: baseline, whole corpus training

type_fscore 0.04121 token_fscore 0.03192 type_precision 0.06137 boundary_recall 0 boundary_fscore 0 token_precision 0.1163 type_recall 0.03102 token_recall 0.0185 boundary_precision 0

Old: phrasal, first 200 lines (whole corpus is 300 lines)

token_f-score token_precision token_recall boundary_f-score boundary_precision boundary_recall 0.239 0.3243 0.1892 0.4804 0.7161 0.3614

files

gold.txt tags.txt

mmmaat commented 6 years ago

With phrasal, using your files and doing:

head -200 tags.txt > train.txt
cat tags.txt | wordseg-prep | wordseg-dibs -t phrasal train.txt | wordseg-eval gold.txt

token_precision 0.3243 token_recall 0.1892 token_fscore 0.239 type_precision 0.2084 type_recall 0.2719 type_fscore 0.2359 boundary_precision 0.7161 boundary_recall 0.3614 boundary_fscore 0.4804

And with baseline...

cat tags.txt | wordseg-prep | wordseg-dibs -t baseline train.txt | wordseg-eval gold.txt

token_precision 0.6308 token_recall 0.6004 token_fscore 0.6152 type_precision 0.4365 type_recall 0.5766 type_fscore 0.4969 boundary_precision 0.832 boundary_recall 0.7844 boundary_fscore 0.8075

I think your problem is that you are using train.txt in the prepared version (ie with no word boundaries), and the program condiders phones as words. ie you did someting like head -200 tags.txt | wordseg-prep > train.txt

alecristia commented 6 years ago

Yes, this must be it! Amazing - performance > .6... Thanks for your patience

bootphon / wordseg

dibs performance way too low #24

New: baseline, whole corpus training

Old: phrasal, first 200 lines (whole corpus is 300 lines)

files