SillyTavern / SillyTavern-Extras

Extensions API for SillyTavern.
GNU Affero General Public License v3.0
544 stars 123 forks source link

talkinghead: how to get manual poser working? #199

Closed Technologicat closed 9 months ago

Technologicat commented 9 months ago

I'd like to produce static expression images using this, but I'm having trouble running the manual poser app.

The manual poser seems to expect a live2d folder; is live2d a dependency, or is a blank folder enough? By a quick look in the source code, this doesn't seem to be actually used except as a target directory. So, I created a blank talkinghead/live2d folder.

Running python -m tha3.app.manual_poser, in a terminal at the talkinghead top-level folder, the manual poser application starts up.

Note that the manual says to instead use python tha3/app/manual_poser.py, but this doesn't work. The app crashes upon startup, because sys.path differs from what imports in the app expect. When running with -m, the cwd is added to sys.path, so that looking up modules like tha3.poser or tha3.util works (when we invoke the python -m ... command in the folder that contains the tha3 folder).

But with the latter option (running a .py file, not a module), it is the script's containing folder, i.e. the tha3/app folder, that is added to sys.path. Then trying to look up modules in a module path beginning with tha3 will not work, since the top level is already at tha3.app. (Relative imports wouldn't work, either, because in this mode the main script is not treated as a module - yeah, Python's import system can be confusing.)

Anyway, this way I got the app to start, and the window appeared, but I still couldn't get it to do anything.

Loading an image to test with (e.g. talkinghead/tha3/images/example.png), the app raises the following exception:

Traceback (most recent call last):
  File "/home/xxx/SillyTavern-extras/talkinghead/tha3/app/manual_poser.py", line 393, in update_images
    output_image = self.poser.pose(self.torch_source_image, pose, output_index)[0].detach().cpu()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxx/SillyTavern-extras/talkinghead/tha3/poser/general_poser_02.py", line 61, in pose
    output_list = self.get_posing_outputs(image, pose)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxx/SillyTavern-extras/talkinghead/tha3/poser/general_poser_02.py", line 65, in get_posing_outputs
    modules = self.get_modules()
              ^^^^^^^^^^^^^^^^^^
  File "/home/xxx/SillyTavern-extras/talkinghead/tha3/poser/general_poser_02.py", line 46, in get_modules
    module = self.module_loaders[key]()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxx/SillyTavern-extras/talkinghead/tha3/poser/modes/separable_float.py", line 288, in <lambda>
    lambda: load_eyebrow_decomposer(module_file_names[Network.eyebrow_decomposer.name]),
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxx/SillyTavern-extras/talkinghead/tha3/poser/modes/separable_float.py", line 163, in load_eyebrow_decomposer
    module.load_state_dict(torch_load(file_name))
                           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxx/SillyTavern-extras/talkinghead/tha3/util.py", line 246, in torch_load
    with open(file_name, 'rb') as f:
         ^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'talkinghead/tha3/models/separable_float/eyebrow_decomposer.pt'

Note the path.

The file tha3/models/separable_float/eyebrow_decomposer.pt does exist (I've actually installed the full model package mentioned in the manual), but we're already running this inside talkinghead.

Trying to run the manual poser from the top-level SillyTavern-extras folder as python -m talkinghead.tha3.app.manual_poser does not work either, because then the module paths will be wrong again.

Any ideas?

EDIT: Note that the live mode of talkinghead is working fine; it's only the manual poser app I'm having trouble with.

Technologicat commented 9 months ago

A quick look at the source code shows that this comes from the loaders in talkinghead/tha3/poser/modes/.

The call sequence is as follows:

The create_poser function has the following default directory that is used in its file paths:

dir = "talkinghead/tha3/models/separable_float"

This will trigger an exception much later, when trying to use the model and the file is not found at the expected path.

But since the live mode is working fine, and it runs with a different cwd (the top level of SillyTavern-extras), I'd wager the default directory should not be changed.

Maybe the easiest solution is to just adapt the imports in the manual poser, so that we could run both apps from the top level of SillyTavern-extras?

EDIT: Meh, just changing the imports in manual_poser.py to from talkinghead.tha3... doesn't work - then talkinghead/tha3/poser/modes/load_poser.py (which I suppose is shared between the live mode and the manual poser app, so we can't make any changes there that would break live mode) can't find tha3.poser.modes.separable_float.

The create_poser function takes a module_file_names dict, which can take a path for each model part, but the caller load_poser doesn't take such an argument from its caller, and in any case, using it would break the division of responsibilities (since it's each model's job to know its filenames).

Probably the best solution would be to allow providing an override for the default directory (while keeping the original filenames), and use that in the manual poser?

Maybe I'll just wait if anyone has better ideas. :)

Cohee1207 commented 9 months ago

I have no idea, sorry, I didn't develop it. Honestly this module feels an unnecessary burden to support. The results are uncanny and it's very unoptimized. Any reason you're specifically looking into it and not a real live2d plugin?

Technologicat commented 9 months ago

Ok.

Maybe I should explain my use case. It's not AItubing. :)

At the moment, I'm looking into LLMs mainly for two target applications: retrieval-powered question answering for serious scientific use (to be able to skim papers faster), and AI-powered storywriting for personal use. I'm also interested in an AIDungeon-like interactive text adventure/roleplay use case, but haven't had the time to properly set up and play an adventure yet.

The thing is, I find that interfacing with a faceless LLM feels cold. So I want the system to present itself as a virtual anime character.

Of course, I'm aware that an LLM is a simulator, not a character (a good writeup of the discussion surrounding this is Shanahan et al., 2023); but presenting itself as a specific character yields a nice user interface. Original character, mind you - I'm not interested in making existing anime or video game characters answer my questions about numerical methods and such.

So, I was intrigued by the prospect of an automatic tool to generate character expressions. If I only need to provide one static image per character, that's much more manageable than 28 (when using classify powered by distillbert). This would allow agile editing of the character's visual appearance.

Instead of a whole evening of inpainting in Stable Diffusion, creating a new character (or modifying the look of an existing one) would only take 30 minutes at most, including everything: playing around with the Stable Diffusion prompt, rerolling txt2img to get the perfect shot, and finally automatically removing the background with rembg.

As for why, agile is not just a software development methodology, it's a lifestyle.

Generating a set of static images by an offline (batch) process would be fine, which is why I was looking into the manual poser.

The live mode is just a nice bonus. It would make the character feel more alive, and as I said, it works fine, but as you said, it needs optimization.

As for live2d, it's proprietary so I'd rather avoid it, and I'm not even sure if it runs on Linux. Their download page didn't say.

Anyway, while I don't have a lot of time for extra software development projects, I might take a look if I can get the manual poser working.

In any case I think talkinghead is a really cool technology demo, so if possible, I'd prefer keeping it around. This and websearch are why I installed extras in the first place.

Technologicat commented 9 months ago

I got the manual poser working!

Only needed to pass the correct model paths without breaking the app.py use case.

Small PR to follow.

Obviously, a single static image has its limitations, but in a reasonable parameter range, at least for the example character... wow. Just, wow.

This is exactly what I was looking for.

Technologicat commented 9 months ago

Posted the PR. See #203.

Technologicat commented 9 months ago

Closing this since the app works now. Let's track the ongoing changes in the PR ticket.