deepghs / waifuc

Efficient Train Data Collector for Anime Waifu
https://deepghs.github.io/waifuc/
MIT License
291 stars 25 forks source link

New feature: removing character tags from TXT files and labeling them with character names (similar to danbooru) #45

Open wochenlong opened 9 months ago

wochenlong commented 9 months ago

For example, I want to retrieve image assets of Shu from Arknights on danbooru, where her tag is "shu_(arknights)". Here's my code:

from waifuc.action import *
from waifuc.export import *
from waifuc.source import *
if __name__ == '__main__':
    s = DanbooruSource(['shu_(arknights)'])
    s.attach(
        ModeConvertAction('RGB', 'white'),
        NoMonochromeAction(), 
        ClassFilterAction(['illustration', 'bangumi']), 
        FilterSimilarAction('all'), 
        FaceCountAction(1),  
        PersonSplitAction(),  
        FaceCountAction(1),  
        CCIPAction(),
        AlignMinSizeAction(1024),
        TaggingAction(force=True),
        FilterSimilarAction('all'),  
        FirstNSelectAction(100),  
        RandomFilenameAction(ext='.png'),  
    ).export(

        TextualInversionExporter('/root/data/shu_dataset')
    )

Taking one of the pictures as an example 0e30af3a32c6be3ead1925a63836feeb05f1485c The content of the current TXT label file is as follows: 1girl, solo, long_hair, very_long_hair, bare_shoulders, pointy_ears, white_background, looking_at_viewer, jewelry, blue_eyes, multicolored_hair, dress, green_hair, earrings, cowboy_shot, simple_background, white_dress, horns, closed_mouth, holding, blue_hair, dragon_horns, long_sleeves, off_shoulder, tassel, bangs, white_coat, gloves, tassel_earrings, blonde_hair, detached_collar

The labels that need to be processed are:

1.Common character features: xx_hair, xx_eyes, bangs, ahoge, ahoge... 2.Semantic duplicates: dragon_horns, horns: Remove "horns" and keep the most meaningful one, which is "dragon_horns". After removing the labels, the TXT file will generally look like this: 1girl, solo, bare_shoulders, white_background, looking_at_viewer, jewelry, cowboy_shot, simple_background, white_dress, closed_mouth, holding, dragon_horns, long_sleeves, off_shoulder, tassel, white_coat, gloves, tassel_earrings, detached_collar

Additionally, because "shu(arknights)" is a new character and cannot be recognized by the wd14 model, you need to prepend it to the TXT file, resulting in: ```shu(arknights), 1girl, solo, bare_shoulders, white_background, looking_at_viewer, jewelry, cowboy_shot, simple_background, white_dress, closed_mouth, holding, dragon_horns, long_sleeves, off_shoulder, tassel, white_coat, gloves, tassel_earrings, detached_collar```

例如,我想从danbboru获取来自arknights的shu的图片素材,她的标签是“shu_(arknights)”,此时我的代码是:

from waifuc.action import *
from waifuc.export import *
from waifuc.source import *
if __name__ == '__main__':
    s = DanbooruSource(['shu_(arknights)'])
    s.attach(
        ModeConvertAction('RGB', 'white'),
        NoMonochromeAction(), 
        ClassFilterAction(['illustration', 'bangumi']), 
        FilterSimilarAction('all'), 
        FaceCountAction(1),  
        PersonSplitAction(),  
        FaceCountAction(1),  
        CCIPAction(),
        AlignMinSizeAction(1024),
        TaggingAction(force=True),
        FilterSimilarAction('all'),  
        FirstNSelectAction(100),  
        RandomFilenameAction(ext='.png'),  
    ).export(

        TextualInversionExporter('/root/data/shu_dataset')
    )

以其中的一张图片为例 0e30af3a32c6be3ead1925a63836feeb05f1485c

此时的txt打标文件内容为: 1girl, solo, long_hair, very_long_hair, bare_shoulders, pointy_ears, white_background, looking_at_viewer, jewelry, blue_eyes, multicolored_hair, dress, green_hair, earrings, cowboy_shot, simple_background, white_dress, horns, closed_mouth, holding, blue_hair, dragon_horns, long_sleeves, off_shoulder, tassel, bangs, white_coat, gloves, tassel_earrings, blonde_hair, detached_collar 其中,需要处理的标签有: 1.常见的角色特征词 xx_hair,xx_eyes,bang,ahoge,ahoge... 2.语义重复 dragon_horns, horns :删除horns,保留其中意义最丰富的一项,保留dragon_horns 完成删标后,txt一般会变成这样: 1girl, solo,bare_shoulders, white_background, looking_at_viewer, jewelry,cowboy_shot, simple_background, white_dress, closed_mouth, holding, dragon_horns, long_sleeves, off_shoulder, tassel,white_coat, gloves, tassel_earrings, etached_collar

此外,因为shu(arknights)是新角色,wd14模型无法识别,因此,需要将shu(arknights)前置到txt里面,变为: shu_(arknights),1girl, solo,bare_shoulders, white_background, looking_at_viewer, jewelry,cowboy_shot, simple_background, white_dress, closed_mouth, holding, dragon_horns, long_sleeves, off_shoulder, tassel,white_coat, gloves, tassel_earrings, etached_collar