cyber-meow / anime_screenshot_pipeline

A 99% automatized pipeline to construct training set from anime and more for text-to-image model training
MIT License
178 stars 10 forks source link

Anime2SD

A 99% automatized pipeline to construct training set from anime and more for text-to-image model training

Demonstration: https://youtu.be/-Nzj6SEU9XY?si=8-o9vN6ToTRTeGea

The old scripts and readme have been moved into scripts_v1.

Note that the new naming of metadata follows the convention of waifuc and is thus different from the name given to the older version. For conversion please use utilities/convert_metadata.py.

Basic Usage

The script automatic_pipeline.py allows you to construct a text-to-image training set from anime with minimum effort. All you have to do is

python automatic_pipeline.py \
    --anime_name name_of_my_favorite_anime \
    --base_config_file configs/pipelines/base.toml \
    --config_file configs/pipelines/screenshots.toml configs/pipelines/booru.toml [...]

Providing multiple configuration files allow for parallel processing of fanarts and animes (and even for parallel processing of multiple animes). You can either create your own configuration files or overwrite existing values by command line arguments.

Of course, you can always go without configuration files if you do not need to run multiple pipelines in parallel.

python automatic_pipeline.py \
    --start_stage 1 \
    --end_stage 7 \
    --src_dir /path/to/video_dir \
    --dst_dir /path/to/dataset_dir \
    --character_ref_dir /path/to/ref_image_dir \
    --pipeline_type screenshots \
    --crop_with_head \
    --image_prefix my_favorite_anime \
    --ep_init 3 \
    --log_prefix my_favorite_anime

:bulb: You can first run from stages 1 to 3 without --character_ref_dir to cluster characters. Then you go through the clusters to quickly construct your reference folder and run again from stages 3 to 7 with --character_ref_dir now given. See Wiki for details.
:bulb: Although it is possible to run from stage 0 which downloads anime automatically, it is still recommended to prepare the animes yourself as the downloading part is not fully optimized (may just hang if there are no seeders etc).

There are a lot of arguments (more than 100) that allow you to configure the entire process. See all of them in the aforementioned configuration files or with

python automatic_pipeline.py --help

It is highly recommended to read at least Main Arguments so that you know how to set up things correctly.

Advanced Usage

There are three ways that you can use the script.

Pipeline Overview

The script performs all the following automatically.

Dataset Organization and Training

Installation

  1. Clone this directory

    git clone https://github.com/cyber-meow/anime_screenshot_pipeline
    cd anime_screenshot_pipeline
  2. Depending on your operating system, run either install.sh or install.bat in terminal

  3. Don't forget to activate the environment before running the main script

Additional Steps and Known Issues

Change Logs

Main

Secondary

TODO / Potential improvements

Contributions are welcome

Secondary

Advanced

Credits