cyber-meow / anime_screenshot_pipeline

A 99% automatized pipeline to construct training set from anime and more for text-to-image model training
MIT License
201 stars 13 forks source link

Some ideas on outfit #49

Open TomLucidor opened 8 months ago

TomLucidor commented 8 months ago

There are a few scenarios regarding outfit (can be co-occurring):

  1. One character having multiple outfits that are routinely worn, throwing away one-off outfits
  2. Multiple character wearing the same types of outfit with minimal or slight variation

Since full-body segmentation exists, it would be a prime focus for creating a clustering technique for outfit embeddings

See also the notes made for a Segmentation library https://github.com/SkyTNT/anime-segmentation/issues/13

cyber-meow commented 8 months ago

Indeed, what is missing is just some contrastive learning for outfits. This should be in principle the same as ccip for characters and the dataset is already there (ask Narugo). It is just no one that I know really gets time to work on it.

More generally speaking, I would like to see more fundamental groundwork for anime images, including fine tuning all major ssl model (dino, Mae, clip, aim etc.), vlms such as llava, and other vision models such as yolo-world and sam for anime. This will make our life much easier.

This being said, I am too busy to further work on this project or anime stuff at the moment. I am working on text to image, but more fundamental things and not anime for now.

TomLucidor commented 3 months ago

Thinking that newer research in general are good, but not sure if Open Model Initiative would shake the monopoly. VLMs tho will need even more data to operate besides finding open LLM backends (and the Transformer vs Mamba-esque debate is fun too) P.S. Is this good? https://github.com/CartoonSegmentation/CartoonSegmentation