Open aarppe opened 4 years ago
For recordings, we should also be able to distinguish, in the inflectional paradigms, between a) a recording spoken by a person (like: 🧑🏽🔈), and b) a recording generated by a speech synthesizer (like: 🤖🔈)
Here's the visualization we discussed today (UPDATE: removed squiggly lines due to English spell-checking):
[below with English translation pop-up:]
We have robo-speech available in this case for all inflected word forms, indicated by 🤖🔈, except the few word-forms for which recording(s) spoken by a person exist, indicated by 🧑🏽🔈. Note also that sometimes a word-form might exists in a corpus (i.e. it has been observed), but we do not have spoken recording of that word, and thus only a robo-snippet can be made available.
Note also that I'm using 1) em-dash to indicate a lacuna (a word-form that doesn't exist for this particular lemma but is possible for other members of this particular word-class), in contrast to 2) grey color to indicate a cell for which no word-forms can exist in this paradigm - the cell can be seen as an artifact of the organization of the table for this particular word-class (deliberately and explicitly showing the impossibility of a form/feature combination), potentially reflecting features available for other similar word-classes but not this particular one.
Since for reasons of storage space we might not be able to provide a generated recording for each and every form of every paradigm, there could be a substantial number of lemmas with word forms that would have neither indicator of a recording. Also, we might choose to implement an option to reduce the clutter where one can opt to see if word-forms have recordings (the above symbols) or not (without any symbols, but also without access to such recordings).
Besides all the above, we might even have pop-ups for the various cells, providing a generated English translation (an attempt above for kôhkomak --> your grand-mothers; your respected female elders). Also, we can show the frequency of a word-form in corpora (already imported and included in the underlying data structure, but not shown). And we might also have pop-ups indicating what the various morphemes mean in the word-forms in the paradigms - though where we would be able to squeeze that remains to be seen/explored.
@kobexamoh In the above comment, I hope to have sketched out all the possible combinations of features we might want to be able to show. As discussed earlier, we might have an option to see, or not to see, word-forms for which we have a human or robot recording - the former resulting in more clutter than the latter, so it's up to user preferences.
Some further mock-ups. These represent two strategies: 1) using toggles to supplement across-the-board information to a basic paradigm; and 2) showing the full extent of available information as pop-ups for individual cells. Both strategies could be deployed at the same time
a. Initial set-up: only show paradigm-wide if a word-form has been observed or not
b. Show paradigm-wide also morpheme boundaries
c. Show paradigm-wide also recordings
d. Show paradigm-wide also morpheme boundaries and recordings
e. Cell-wise pop-up
@nienna73 When getting to the implementation of showing morpheme boundaries (and other things), the sketches above may be worthwhile to review.
I added the robot and person emojis, how is this looking so far?
I did choose the "person with curly hair" emoji, but it always comes off a little masculine to me.
Looks nice. Eddie was using the gender-neutral generic person emoji 🧑 (with a darker skin/hair tone). But with actual people speaking, we probably could use the genders-specific emojis, since we should know that information. On a lighter side, we could have speakers choose the emoji they'd prefer - perhaps later on.
@kobexamoh Here's a sketch of the various layouts combining 1) observed vs. 2) unobserved forms vs. 3) lacunae + 4) morpheme boundaries + 5) recordings.
crk-itwêwina-paradigm-layout-mockups.docx