HelloVision / HelloMeme

The official HelloMeme GitHub site
https://songkey.github.io/hellomeme/
MIT License
199 stars 13 forks source link

Context Shift - Warping Output #16

Open inferno46n2 opened 1 week ago

inferno46n2 commented 1 week ago

Hey there!

I was wondering if your team could provide some helpful tips on how to avoid these really obvious context shifts every 16 frames?

For reference, the workflow I did was: 1) Stylize the first frame of the driving video with img2img + controlnets 2) I am using this checkpoint as I found the realistic vision checkpoints ruined the big eyes and wanted to force it back to regular human proportions https://civitai.com/models/114413/disney-stylev1 3) Default settings

https://github.com/user-attachments/assets/8160f609-959c-4e45-9f58-2cc099c19815

songkey commented 1 week ago

This remains an unresolved issue for us, as Animatediff performs localized modeling over 16-frame (or 12-frame) time segments, making it difficult to ensure consistency between time segments. I am working on ways to improve this.

Currently, you can try the following methods:

  1. Use the latest code and try a larger patch overlap;
  2. Experiment with different checkpoints (though this may not always work), as we’ve found that different checkpoints vary in their continuity performance, although there tends to be a trade-off between continuity and the richness of expressions.

https://github.com/user-attachments/assets/b8046fc4-ef73-4413-b82d-f96f8bd21796

inferno46n2 commented 1 week ago

This remains an unresolved issue for us, as Animatediff performs localized modeling over 16-frame (or 12-frame) time segments, making it difficult to ensure consistency between time segments. I am working on ways to improve this.

Currently, you can try the following methods:

  1. Use the latest code and try a larger patch overlap;
  2. Experiment with different checkpoints (though this may not always work), as we’ve found that different checkpoints vary in their continuity performance, although there tends to be a trade-off between continuity and the richness of expressions.

    overlap-example_fps15.mp4

Thanks for the reply!

I’ve been using AnimateDiff since it released and have produced some incredibly stable content with it.

A suggestion would be to simple make your bespoke things (refnet, your controlnet, etc) function with this node pack as it’s the gold standard of AnimateDiff in comfyUI. It has things like free noise for example which drastically improve things.

https://github.com/Kosinkadink/ComfyUI-AnimateDiff-Evolved

https://github.com/Kosinkadink/ComfyUI-Advanced-ControlNet

long story short - your implementation needs to be more compatible with existing infrastructure

inferno46n2 commented 1 week ago

This remains an unresolved issue for us, as Animatediff performs localized modeling over 16-frame (or 12-frame) time segments, making it difficult to ensure consistency between time segments. I am working on ways to improve this.

Currently, you can try the following methods:

  1. Use the latest code and try a larger patch overlap;
  2. Experiment with different checkpoints (though this may not always work), as we’ve found that different checkpoints vary in their continuity performance, although there tends to be a trade-off between continuity and the richness of expressions.

    overlap-example_fps15.mp4

Just tested your latest code - significant improvement. Much much better thank you.