Closed alecristia closed 6 years ago
The error happens not infrequently and with different corpora. It is reproducible on the exact same corpora. Oddly enough, in concatenated versions that properly contain a problematic corpus, the error may not occur.
Alejandrinas-MacBook-Air:wordseg acristia$ thisgold="/Users/acristia/Dropbox/gold.txt" Alejandrinas-MacBook-Air:wordseg acristia$ thisprep="/Users/acristia/Dropbox/prepared.txt" Alejandrinas-MacBook-Air:wordseg acristia$ thisunit="syllable" Alejandrinas-MacBook-Air:wordseg acristia$ thistag="/Users/acristia/Dropbox/concat.txt" Alejandrinas-MacBook-Air:wordseg acristia$ out="/Users/acristia/Dropbox/tprel.txt"
Alejandrinas-MacBook-Air:wordseg acristia$ cat $thisprep | wordseg-tp -t relative > $out Alejandrinas-MacBook-Air:wordseg acristia$ cat $out | wordseg-eval $thisgold fatal error: gold and train have different size: len(gold)=2456, len(train)=2455
Alejandrinas-MacBook-Air:wordseg acristia$ outabs="/Users/acristia/Dropbox/tpabs.txt" Alejandrinas-MacBook-Air:wordseg acristia$ cat $thisprep | wordseg-tp -t absolute > $outabs Alejandrinas-MacBook-Air:wordseg acristia$ cat $outabs | wordseg-eval $thisgold type_fscore 0.467 token_fscore 0.5799 type_precision 0.4033 boundary_recall 0.6038 boundary_fscore 0.7296 token_precision 0.6774 type_recall 0.5546 token_recall 0.5069 boundary_precision 0.9216
tags file: https://www.dropbox.com/s/fmig92ejds32jwn/concat.txt?dl=0 gold file: https://www.dropbox.com/s/2t7q762l6c4mm9h/gold.txt?dl=0 prepared file: https://www.dropbox.com/s/gtqzvniyuz44f9n/prepared.txt?dl=0 segmented with Relative: https://www.dropbox.com/s/mvozo09cy4ow9an/tprel.txt?dl=0 segmented with Absolute: https://www.dropbox.com/s/p47yko32gb8xlmi/tpabs.txt?dl=0
OK the bug occurs when the last utterance has a single phone, I'm on it!
Woops I did a mistake and broke again tp relative... Working on...
The error happens not infrequently and with different corpora. It is reproducible on the exact same corpora. Oddly enough, in concatenated versions that properly contain a problematic corpus, the error may not occur.
code verbatim
prep
Alejandrinas-MacBook-Air:wordseg acristia$ thisgold="/Users/acristia/Dropbox/gold.txt" Alejandrinas-MacBook-Air:wordseg acristia$ thisprep="/Users/acristia/Dropbox/prepared.txt" Alejandrinas-MacBook-Air:wordseg acristia$ thisunit="syllable" Alejandrinas-MacBook-Air:wordseg acristia$ thistag="/Users/acristia/Dropbox/concat.txt" Alejandrinas-MacBook-Air:wordseg acristia$ out="/Users/acristia/Dropbox/tprel.txt"
run tp-rel
Alejandrinas-MacBook-Air:wordseg acristia$ cat $thisprep | wordseg-tp -t relative > $out Alejandrinas-MacBook-Air:wordseg acristia$ cat $out | wordseg-eval $thisgold fatal error: gold and train have different size: len(gold)=2456, len(train)=2455
run tp-abs
Alejandrinas-MacBook-Air:wordseg acristia$ outabs="/Users/acristia/Dropbox/tpabs.txt" Alejandrinas-MacBook-Air:wordseg acristia$ cat $thisprep | wordseg-tp -t absolute > $outabs Alejandrinas-MacBook-Air:wordseg acristia$ cat $outabs | wordseg-eval $thisgold
type_fscore 0.467 token_fscore 0.5799 type_precision 0.4033 boundary_recall 0.6038 boundary_fscore 0.7296 token_precision 0.6774 type_recall 0.5546 token_recall 0.5069 boundary_precision 0.9216
minimal reproducible example:
tags file: https://www.dropbox.com/s/fmig92ejds32jwn/concat.txt?dl=0 gold file: https://www.dropbox.com/s/2t7q762l6c4mm9h/gold.txt?dl=0 prepared file: https://www.dropbox.com/s/gtqzvniyuz44f9n/prepared.txt?dl=0 segmented with Relative: https://www.dropbox.com/s/mvozo09cy4ow9an/tprel.txt?dl=0 segmented with Absolute: https://www.dropbox.com/s/p47yko32gb8xlmi/tpabs.txt?dl=0