MycroftAI / mycroft-core

Mycroft Core, the Mycroft Artificial Intelligence platform.
https://mycroft.ai
Apache License 2.0
6.52k stars 1.27k forks source link

It should be possible to set the (grammatical) gender of Mycroft #1865

Closed ftyers closed 2 months ago

ftyers commented 6 years ago

For some languages (Portuguese, Spanish) it will be necessary to set the gender of Mycroft according to the voice (female, male) and allow this to be referenced in the dialogues. For example, in Portuguese the word for thankyou is obrigado (male speaker) and obrigada (female speaker). Or "Qué tal Mycroft?" ... "Estoy cansado" (male speaker) / "Estoy cansada" (female speaker)

For other languages, it would be convenient to know the gender of the person speaking to Mycroft to be able to choose the correct greeting. E.g. Icelandic sæll for addressing a male, and sæl for addressing a female. This could be either estimated at utterance time using the speech signal, or alternatively set on a per-user basis.

penrods commented 6 years ago

Human languages continue to fascinate me!

For the first one (knowing the "gender" of the speaking device) we can consider this going forward. This requires some plumbing throughout the world to associate a gender with the voice, as right now voices have no such info.

For the second one, I don't think there is a practical solution in the near term. Distinguishing a speaker's gender from the voice just isn't going to be reliable. But it is an interesting thing to keep in mind as we evolve the technology.

ftyers commented 6 years ago

There are lots of fun typological things to take into account :) I'll try and continue posting it here as I think about it. For the second one, a short-term solution would be to allow a gender to be associated to a particular user.

In the end, the more linguistically extensible the platform is, the easier it will be to adapt. Would it be useful to you guys to have a document that discusses language typology and dialogue systems?

KathyReid commented 6 years ago

One of the elements to consider here is also the choice and agency of the speaker. Some speakers may have a male-sounding voice, but choose to express their gender as female, and vice versa. Some speakers may not associated with one of the binary genders (male/female). So to be inclusive and respectful, we also need to consider these use cases and how to handle them in a way that provides a positive user experience.

Side note: I'm reading "Made by Humans" by Ellen Broad at the moment, it's such a mind opener https://www.goodreads.com/book/show/39806467-made-by-humans

ftyers commented 6 years ago

@KathyReid yes, but this should be programmable, we shouldn't have "male morphology with female voice" or "male morphology with male voice" as the only options.

KathyReid commented 6 years ago

In the end, the more linguistically extensible the platform is, the easier it will be to adapt. Would it be useful to you guys to have a document that discusses language typology and dialogue systems?

Yes! Absolutely! Thank you for volunteering!

JarbasAl commented 5 years ago

as a portuguese speaker the thank you example really annoys me, i proposed a solution for that, the second part is trickier

i played with detecting gender from utterance audio using gmmhmms , maybe it was the training data but results were mixed, mostly correct but i think we need a strong baseline before integrating something like this in the pipeline

associating per user could work, and would account for gender identity preferences, but we don't yet have speaker identification in the pipeline either

in the end i think a mixed approach would be best, if we get decent accuracy in gender detection that would be a good default case, allowing per user settings to override this would also be desired

JarbasAl commented 5 years ago

https://github.com/JarbasAl/voice-gender gender recognition seems to work well without complicated machine learning, i will play around with integrating this on mycroft-core

ChanceNCounter commented 4 years ago

RFC:

After some excellent discussion in chat between myself, @JarbasAl and @KathyReid, a proposal has come together: add metadata to dialog files, on a per-line basis. Use a delimiter that's unlikely ever to appear in an actual .dialog file, like ^

Perhaps Mycroft is sleepy, and speaking Spanish, invoking a hypothetical es-es/i.am.tired.dialog:

estoy cansado ^ gender=masculine; some-other-metadata-here=value
estoy cansada ^ gender=feminine

The work on the dialog renderer is pretty easy. One of us might throw it together just for good measure (I'm writing this on my way to a coffee break.) In short, the renderer would str.split() on the delimiter (naturally), parse result[1::] for gender, and discard entries which don't correspond with the configured gender.

it might be wise to cache the first valid option that satisfies the "default" gender, in case the renderer does not find any lines that match its configured gender.

ftyers commented 4 years ago

@ChanceNCounter this seems like a fairly ad hoc solution and algorithm. Have you done some research on how this kind of thing is done in NLP in general ? It seems you could use the Feature=Value pairs from the Universal Dependencies project this would mean that you could take advantage of existing resources --- e.g. if you want to parse human input you could use the data from UD to train a parser and get the same Feature=Value pairs back, instead of having to do conversion (nearly always non-trivial) or annotate your own data. In addition the decision on which item to select could be done based on unification or some other known algorithm.

ChanceNCounter commented 4 years ago

@ftyers just to clarify, we're not parsing human input here. we're trying to introduce an additional, lower level of granularity to existing banks of output phrases.

This class is used to pull a line of dialog from a particular file, usually peculiar to a skill. Right now, the example Spanish dialog above would probably read,

estoy cansado

and that's it.

Our hope here is to add gendered output to existing skills, without any breaking changes to the existing dialog renderer.

ftyers commented 4 years ago

@ChanceNCounter I understand that point, my point was that if "in the future" you want to parse human input you could take advantage of this existing massively multilingual resource. Rather than make the categories up ad hoc you could use ones that are predefined, and then avoid reinventing the wheel, future issues with conversion or annotation, etc.

In addition, I think (without too much further investigation) that the general approach is reasonably sound, at least it is an incremental improvement. As I mentioned before however, thinking about future usage scenarios might avoid duplication of effort later. For English and Portugese effort is cheap. For the majority of the world's languages it is not.

forslund commented 2 months ago

Closing Issue since we're archiving the repo