New Feature: IPA input/output for words

SaphireLattice commented 2 years ago

Somewhat related to #1, probably could be done in bulk with it. Also relevant is #3, to be able to assign own custom representations, and show some of the used ones (IPA, mine, and the one used by the audio decoding project)

Erquint commented 1 year ago

IPA is a given.

I've spent the last day burying myself in the phonetic construction of the language to get very confident with it before I enable the spoiled mode and I couldn't help but notice that the developers surely must've strictly adhered to American English IPA transcriptions given on Wiktionary. To much detriment of the conlang, I should add, since it ends up carrying forth its redundant vowel inventory overabundance. I'm convinced they had an IPA-to-cubescript translator coded up indev in addition to manual input. I did spot some quite mistaken transcriptions made by the developers due to either bad IPA transcription source or human error. Such as bc9f /oʊt/ trying to spell "out" on manual page 16, which should've been a88c /aʊt/ as used in the rest of the manual.

And of course spelling phonemes using Latin characters with English speaker assumptions has always been horrid, given how starkly disjointed phonemes are from glyphs in English language. Never a way to escape ambiguity.

Erquint commented 1 year ago

Since I recently enabled the spoiled mode, I'm noticing many issues with vowel transcriptions the webapp currently displays. Maybe I'm wrong with my assumptions, but it seems to me that it misses a lot of interstitial vowels. See for yourself: I'm attaching my rough notes.

I'm sure I got something wrong too.

There's two /əɹ/s. ~~The bottom one should actually be /ɔɹ/, even though there is already another /ɔɹ/.~~ The bottom one actually seems bogus. No idea where it came from.
That /æɹ/ is rather /ɛ(ə)ɹ/ or even /ɛɚ/, but is probably sufficient to represent as /ɛɹ/. Even if it still feels to me like /æɹ/ kinda encompasses an approximation of those options — I'm afraid that might be my bias.
The top /aɹ/ should be /ɑɹ/.
The /ei/ should be /eɪ/.
The /ə/ at the very top is basically /ʌ/ which is very prone to degrade into schwa in context, especially since cubescript doesn't denote stress/accent.

Here's an updated version with those corrections. Helpfully this helps.

I just doublechecked all of the vowels and there's a lot of discrepancy with what the webapp currently presents using the unreliable medium of Englatin.

Erquint commented 1 year ago

A snippet of minimal modification to your source.

const phonemes = {
    out: [
        { mask: 0b0001_0000_0001_0011, text: "æ" },
        { mask: 0b0001_0000_0001_0001, text: "ɔ" },
        { mask: 0b0000_1100_0000_0000, text: "ɪ" },
        { mask: 0b0001_1100_0001_0000, text: "ɛ" },
        { mask: 0b0001_0100_0001_0000, text: "ʊ" },
        { mask: 0b0000_0000_0000_0011, text: "ʌ" },

        { mask: 0b0001_1100_0001_0001, text: "i" },
        { mask: 0b0001_0100_0001_0011, text: "u" },
        { mask: 0b0001_1100_0001_0010, text: "əɹ" },
        { mask: 0b0001_1000_0001_0011, text: "ɔɹ" },
        { mask: 0b0000_1100_0000_0011, text: "ɑɹ" },
        { mask: 0b0001_1000_0001_0001, text: "ɪɹ" },

        { mask: 0b0000_0000_0000_0001, text: "eɪ" },
        { mask: 0b0000_0000_0000_0010, text: "aɪ" },
        { mask: 0b0000_0100_0000_0000, text: "ɔɪ" },
        { mask: 0b0000_1000_0000_0000, text: "aʊ" },
        { mask: 0b0001_1100_0001_0011, text: "oʊ" },
        { mask: 0b0001_1000_0001_0000, text: "ɛɹ" },
    ],
    in: [
        { mask: 0b0000_0011_0000_0000, text: "m" },
        { mask: 0b0000_0011_0000_0100, text: "n" },
        { mask: 0b0010_0011_1010_1100, text: "ŋ" },
        { mask: 0b0010_0000_1000_1000, text: "p" },
        { mask: 0b0000_0010_1010_0000, text: "b" },
        { mask: 0b0010_0000_1000_1100, text: "t" },

        { mask: 0b0000_0011_1010_0000, text: "d" },
        { mask: 0b0000_0010_1010_1000, text: "k" },
        { mask: 0b0010_0010_1000_1000, text: "g" },
        { mask: 0b0000_0001_1010_0000, text: "d͡ʒ" },
        { mask: 0b0010_0000_1000_0100, text: "t͡ʃ" },
        { mask: 0b0010_0001_1000_1000, text: "f" },

        { mask: 0b0000_0010_1010_0100, text: "v" },
        { mask: 0b0010_0000_1010_1100, text: "ð" },
        { mask: 0b0010_0011_1010_0000, text: "θ" },
        { mask: 0b0010_0001_1010_1000, text: "s" },
        { mask: 0b0010_0010_1010_0100, text: "z" },
        { mask: 0b0010_0011_1000_1100, text: "ʃ" },

        { mask: 0b0000_0011_1010_1100, text: "ʒ" },
        { mask: 0b0010_0010_1010_0000, text: "h" },
        { mask: 0b0010_0000_1010_1000, text: "ɹ" },
        { mask: 0b0010_0000_1010_0100, text: "j" },
        { mask: 0b0000_0000_0000_1100, text: "w" },
        { mask: 0b0010_0000_1010_0000, text: "l" },
    ],
};

Looks like you were missing /ɔɪ/ ("-?-") and didn't quite distinguish /ð/ ("?th?") from /θ/.

Reordered in the PR with spaces and underscores of the phonetic output omitted.

Erquint commented 1 year ago

If this PR is merged or you reimplement it as an option, there's a bonus feature you could go for.

You know how sometimes you can be trying to read the proper phonemes in the proper order and yet they just refuse to click into a recognized word in your head because of the perceived artificiality? Then you have to make yourself speak them out loud and try your best to listen to your own voice and interpret it as if you weren't thinking of the written phonemes in your head…

What could help is somebody reading it out loud for you at a press of a button. In case you weren't aware — many if not the majority of TTS engines are based on IPA and can be prompted with it directly using some SSML markup. I usually use this sort of prompt: <phoneme alphabet="ipa" ph="ˈʐvat͡ɕkə" /> [IPA for bubblegum in Russian.]

Might be worth trying to embed some random WASM TTS or something of the sort to add such a feature.

Erquint commented 1 year ago

Made an attempt to spatially group this mess of vowels. Cubescript vowel map b6037fa832)

I'm not entirely happy with this notional map, but there are conflicting criteria for how I'd set it up.

Made many attempts to find some rhyme and reason to their construction, but for the most part it seems pretty arbitrary. The only notable traits are:

One-off stroke makes a vowel ended with "ɪ".
- Except for one that ends with "ʊ" instead.
A one-stroke-gap makes a vowel ended with an "ɹ".
- Three of which seem to be constructed from "oʊ" with a gap made somewhere, but "ɛɹ" and "ɪɹ" are not.

The direction of strokes and gaps sadly doesn't seem to follow any inheritable system. I can't seem able to systematize them further than that.

SaphireLattice / tunic-decoder

New Feature: IPA input/output for words #4