Open TomLucidor opened 8 months ago
Indeed, what is missing is just some contrastive learning for outfits. This should be in principle the same as ccip for characters and the dataset is already there (ask Narugo). It is just no one that I know really gets time to work on it.
More generally speaking, I would like to see more fundamental groundwork for anime images, including fine tuning all major ssl model (dino, Mae, clip, aim etc.), vlms such as llava, and other vision models such as yolo-world and sam for anime. This will make our life much easier.
This being said, I am too busy to further work on this project or anime stuff at the moment. I am working on text to image, but more fundamental things and not anime for now.
Thinking that newer research in general are good, but not sure if Open Model Initiative would shake the monopoly. VLMs tho will need even more data to operate besides finding open LLM backends (and the Transformer vs Mamba-esque debate is fun too) P.S. Is this good? https://github.com/CartoonSegmentation/CartoonSegmentation
There are a few scenarios regarding outfit (can be co-occurring):
Since full-body segmentation exists, it would be a prime focus for creating a clustering technique for outfit embeddings
See also the notes made for a Segmentation library https://github.com/SkyTNT/anime-segmentation/issues/13