bootphon / wordseg

A Python toolbox for text based word segmentation
https://docs.cognitive-ml.fr/wordseg
GNU General Public License v3.0
16 stars 7 forks source link

wordseg-baseline ignores p parameter #20

Closed alecristia closed 6 years ago

alecristia commented 6 years ago

p=0 should treat each utt as a word, p=1 should cut at every phone (or syll)

p=0 sample

y uwaar g aa nax hh ay ddh ax bl aak ih nmay shert aar yuw owyuwkiy p kl ahng k ax ngy ao r hh ehd ay thih ngk y uw aa r geht axngt ay erd ow yae

p=.5

y uwaar g aan axhhay ddh axb laakih nmay sher taar y uw ow yuw kiypk l ah ngk ax ng yaor hheh d ay th ih ng k yuw aar geh t axng t ay erd owy ae

p=1

y uw aargaan axhhayd dhaxb l aakihnmay sh ert aary uw owyuwk iyp kl ahng kaxngy ao rhheh d ay thihngky uw aa rg eh tax ngtay er d ow y ae

mmmaat commented 6 years ago

Ok this is a bug thank you for reporting.

mmmaat commented 6 years ago

Humm... this is working for me. What are the exact commands you used? From bash the option is -P/--probability, the -p option is for phone separator..

(wordseg) mathieu@deaftone:~/dev/wordseg$ head -5 test/data/prepared.txt | wordseg-baseline -P 1
ay m iy n dh ax k aa p s aa r jh ah s t l uh k ax ng f ao r p iy p ax l dh ae t l uh k y ah ng g er 
t eh n p iy p ax l k ao l s ow sh iy z l ay k ih t s iy z iy sh iy z l ay k ay g eh t p ey d t ax 
v eh r iy ae k t ax v ax n ah m 
m iy n y ae dh eh r w aa z ax t ay m ih t w aa z ax p aa r t ah v aw er k ah l ch er w iy d ih d n iy d t ax hh ah n t t ax iy t 
m uw v t ax ax s ax b er b ax n eh r iy ax ax n t ih l k ax l ah m b ax s g eh t s dh eh r ae k t t ax g eh dh er
(wordseg) mathieu@deaftone:~/dev/wordseg$ head -5 test/data/prepared.txt | wordseg-baseline -P 0
aymiyndhaxkaapsaarjhahstluhkaxngfaorpiypaxldhaetluhkyahngger
tehnpiypaxlkaolsowshiyzlaykihtsiyziyshiyzlaykaygehtpeydtax
vehriyaektaxvaxnahm
miynyaedhehrwaazaxtaymihtwaazaxpaartahvawerkahlcherwiydihdniydtaxhhahnttaxiyt
muwvtaxaxsaxberbaxnehriyaxaxntihlkaxlahmbaxsgehtsdhehraekttaxgehdher
alecristia commented 6 years ago

sorry, you're right - closing this now