direct-phonology / phoNy

phonology in spaCy!
MIT License
0 stars 0 forks source link

reading produced for non-phonetic characters #18

Closed thatbudakguy closed 2 years ago

thatbudakguy commented 2 years ago

the visualization script reveals that we're predicting a reading for punctuation, etc:

tsiX hjwot bjuw bjuw bjuw bjuw bjuw dzyang bjuw

it actually looks like bjuw is being used as a catchall "low-confidence" prediction or something.

thatbudakguy commented 2 years ago

this is happening because bjuw is the first tag added to the pipe and thus has index/label 0; when the Tagger makes an empty prediction it returns a 0 which then maps to this value.