SkyTNT / anime-segmentation

high-accuracy segmentation for anime character
Apache License 2.0
601 stars 59 forks source link

Semi-supervised Support? #11

Open narugo1992 opened 1 year ago

narugo1992 commented 1 year ago

We attempted to export images from dataset_generator.DatasetGenerator but found that they did not accurately represent the actual form of anime images. Therefore, the performance of the trained model on real anime images still needs improvement.

(One sample image) image

(Its mask) image

Therefore, I believe that this training framework should consider supporting semi-supervised learning, allowing users to provide a large number of real anime and illustration images to improve performance in real waifu images. I believe that semi-supervised learning is crucial for training on anime images, especially for tasks like image segmentation that require extremely expensive data annotation.

narugo1992 commented 1 year ago

Based on this, we have trained relatively reliable object detection models for characters and faces in anime images (online demo). If you encounter difficulties in directly performing semi-supervised training using completely unlabeled anime images, you can consider using the Segment Anything (SAM) model. By passing the results of object detection (bounding boxes for characters and midpoints of faces) as input, you can guide the training process towards the desired targets.

SkyTNT commented 1 year ago

It already allow users to provide a large number of real anime and illustration images. Just put real images and masks to imgs folder and masks folder.

narugo1992 commented 1 year ago

It already allow users to provide a large number of real anime and illustration images. Just put real images and masks to imgs folder and masks folder.

I mean Semi-supervised Learning, which means the images are unlabeled. For anime images, it is difficult to obtain a large amount of annotated data, which is why semi-supervised learning is needed.

SkyTNT commented 1 year ago

Oh, I see, but I haven't had time to implement it recently. If you open PR, I will merge it.

narugo1992 commented 1 year ago

I just trained a MODNet with the native dataset, here is one sample:

image

So maybe semi/self-supervised learning is necessary for this model.

I just studied the source code of MODNet, and they provide a self-supervised training based on SOC for fine-tuning on unlabeled data: https://github.com/ZHKKKe/MODNet/blob/master/src/trainer.py#L177. The expecting effect is like in the following image:

image

I will conduct experiments with the aforementioned fine-tuning code and, if the results are good, I will submit a pull request. :smile:

Zarxrax commented 1 year ago

I mean Semi-supervised Learning, which means the images are unlabeled. For anime images, it is difficult to obtain a large amount of annotated data, which is why semi-supervised learning is needed.

I spent the past few months building a new dataset which I was hoping could provide better training for anime style images. I haven't tried training it yet though, because I wanted to make some big modifications to how the dataset generator code worked. Just haven't had time to start writing code yet.

Semi-supervised training sounds interesting, but wouldn't it only be as good as the model used to segment the images?

narugo1992 commented 1 year ago

@Zarxrax Regarding semi-supervised training, my understanding is as follows, considering the following scenarios:

It is evident that the performance of scenario C will likely be better than scenario A, but it may not surpass scenario B.

Moreover, in image segmentation tasks, obtaining precise labeled data is extremely costly, making it challenging to acquire a large amount of real segmentation training data. The approach suggested by @skytnt, which involves generating synthetic training data, is a compromise. However, the generated images still exhibit significant differences compared to real anime images, leaving room for improvement. In fact, based on my actual testing results, the models trained using this method are only marginally usable, particularly with heavy networks like isnetis, and there are still many issues to address.

In practical applications, semi-supervised learning often helps models better learn from real-world data without significantly increasing data annotation costs. Therefore, I believe that semi-supervised learning is necessary for the task of anime character segmentation, considering the limited availability of precisely labeled data and the potential for improved performance in real-world scenarios.