bootphon / wordseg

A Python toolbox for text based word segmentation
https://docs.cognitive-ml.fr/wordseg
GNU General Public License v3.0
16 stars 7 forks source link

TPrel on syll sometimes leads to error, when none of the other algos fail on the same input #25

Closed alecristia closed 6 years ago

alecristia commented 6 years ago

The error happens not infrequently and with different corpora. It is reproducible on the exact same corpora. Oddly enough, in concatenated versions that properly contain a problematic corpus, the error may not occur.

code verbatim

prep

Alejandrinas-MacBook-Air:wordseg acristia$ thisgold="/Users/acristia/Dropbox/gold.txt" Alejandrinas-MacBook-Air:wordseg acristia$ thisprep="/Users/acristia/Dropbox/prepared.txt" Alejandrinas-MacBook-Air:wordseg acristia$ thisunit="syllable" Alejandrinas-MacBook-Air:wordseg acristia$ thistag="/Users/acristia/Dropbox/concat.txt" Alejandrinas-MacBook-Air:wordseg acristia$ out="/Users/acristia/Dropbox/tprel.txt"

run tp-rel

Alejandrinas-MacBook-Air:wordseg acristia$ cat $thisprep | wordseg-tp -t relative > $out Alejandrinas-MacBook-Air:wordseg acristia$ cat $out | wordseg-eval $thisgold fatal error: gold and train have different size: len(gold)=2456, len(train)=2455

run tp-abs

Alejandrinas-MacBook-Air:wordseg acristia$ outabs="/Users/acristia/Dropbox/tpabs.txt" Alejandrinas-MacBook-Air:wordseg acristia$ cat $thisprep | wordseg-tp -t absolute > $outabs Alejandrinas-MacBook-Air:wordseg acristia$ cat $outabs | wordseg-eval $thisgold
type_fscore 0.467 token_fscore 0.5799 type_precision 0.4033 boundary_recall 0.6038 boundary_fscore 0.7296 token_precision 0.6774 type_recall 0.5546 token_recall 0.5069 boundary_precision 0.9216

minimal reproducible example:

tags file: https://www.dropbox.com/s/fmig92ejds32jwn/concat.txt?dl=0 gold file: https://www.dropbox.com/s/2t7q762l6c4mm9h/gold.txt?dl=0 prepared file: https://www.dropbox.com/s/gtqzvniyuz44f9n/prepared.txt?dl=0 segmented with Relative: https://www.dropbox.com/s/mvozo09cy4ow9an/tprel.txt?dl=0 segmented with Absolute: https://www.dropbox.com/s/p47yko32gb8xlmi/tpabs.txt?dl=0

mmmaat commented 6 years ago

OK the bug occurs when the last utterance has a single phone, I'm on it!

mmmaat commented 6 years ago

Woops I did a mistake and broke again tp relative... Working on...