Matamata-Animator / Matamata

Automatically create lip-synced animations
https://youtu.be/U4W1bv_cai0
72 stars 13 forks source link

Pose generation through emotion detection #20

Closed Yey007 closed 3 years ago

Yey007 commented 3 years ago

Is your feature request related to a problem? Please describe. Not a problem, but here is a description. I'd like the program to be able to use some sort of API or library to somewhat detect emotions from my voice/words and then for me to be able to edit it's selections later if necessary.

Describe the solution you'd like If a certain flag is enabled, the script should contact an API with an API key from the environment to detect emotions. Then, it should look up poses for some of the basic emotions in a JSON file. Then, it should insert those poses into the script it generates in transcriber.py or optionally into a timestamps file. It should wait for the user to edit the files by confirming with a prompt that they want to go through with the operation, and then generate as usual.

Describe alternatives you've considered Using some sort of library on the client instead of an API, but seems harder (although would be cool to have an option)

Additional context None.

Yey007 commented 3 years ago

I know the feature is a little gimmicky but being able to generate an entire dynamic video from just voice sounds really cool.

Yey007 commented 3 years ago

By the way, I would be happy to PR something like this at some point if you're fine with that.

effdotsh commented 3 years ago

Sounds pretty cool, and yeah if you want to open a PR go ahead!

Yey007 commented 3 years ago

I've looked into some of the APIs we can use. Libraries in general seem to be quite limited or require training, so I'm not too keen on those. The most promising solution so far is the Watson Tone Analyzer API which provides 2,500 api calls per month at 100 sentences per call at sentence level analysis (what we want) and 1000 sentences per call at document level analysis. This will give us a total of 250,000 sentences per month which should be enough for anyone using the tool since they will have to make an individual account. I am not sure how this will work with #22, but we can introduce it as an English only feature first if necessary.

effdotsh commented 3 years ago

I think it's probably fine to just have an English only version. Both this and #22 will require a separate flag, and there could just be a note saying that the two aren't compatible.

As for how this will work, I think the easiest way would probably be to detect the emotions, and the timestamps for when emotions change, then generate a timestamps file from that. As for how the timestamps system works either check the readme or #18.

Yey007 commented 3 years ago

I think that could work. I was planning on doing it with the script file rather than the new time stamps so we can do it before we send stuff to gentle and don't have to do any extra work calculating. However, it should be possible to do it after gentle if you feel that's the way to go.

effdotsh commented 3 years ago

That would also work. My concern is that I think it would be best to have the same script so you get the same animation quality whether or not you're using emotion detection. I guess either way works tho. Also I'll open another feature request for allowing different text transcription services (watson/ibm/gcp) in case people believe that those services will be tangibly better then the default.