A 99% automatized pipeline to construct training set from anime and more for text-to-image model training
Demonstration: https://youtu.be/-Nzj6SEU9XY?si=8-o9vN6ToTRTeGea
The old scripts and readme have been moved into scripts_v1.
Note that the new naming of metadata follows the convention of waifuc and is thus different from the name given to the older version. For conversion please use utilities/convert_metadata.py.
The script automatic_pipeline.py
allows you to construct a text-to-image training set from anime with minimum effort. All you have to do is
python automatic_pipeline.py \
--anime_name name_of_my_favorite_anime \
--base_config_file configs/pipelines/base.toml \
--config_file configs/pipelines/screenshots.toml configs/pipelines/booru.toml [...]
Providing multiple configuration files allow for parallel processing of fanarts and animes (and even for parallel processing of multiple animes). You can either create your own configuration files or overwrite existing values by command line arguments.
Of course, you can always go without configuration files if you do not need to run multiple pipelines in parallel.
python automatic_pipeline.py \
--start_stage 1 \
--end_stage 7 \
--src_dir /path/to/video_dir \
--dst_dir /path/to/dataset_dir \
--character_ref_dir /path/to/ref_image_dir \
--pipeline_type screenshots \
--crop_with_head \
--image_prefix my_favorite_anime \
--ep_init 3 \
--log_prefix my_favorite_anime
:bulb: You can first run from stages 1 to 3 without --character_ref_dir
to cluster characters. Then you go through the clusters to quickly construct your reference folder and run again from stages 3 to 7 with --character_ref_dir
now given. See Wiki for details.
:bulb: Although it is possible to run from stage 0 which downloads anime automatically, it is still recommended to prepare the animes yourself as the downloading part is not fully optimized (may just hang if there are no seeders etc).
There are a lot of arguments (more than 100) that allow you to configure the entire process. See all of them in the aforementioned configuration files or with
python automatic_pipeline.py --help
It is highly recommended to read at least Main Arguments so that you know how to set up things correctly.
There are three ways that you can use the script.
The script performs all the following automatically.
/path/to/dataset_dir/training
with multiply.txt
in each subfolder indicating the repeat of the images from this directory. More details on this are provided in Dataset Organization.Clone this directory
git clone https://github.com/cyber-meow/anime_screenshot_pipeline
cd anime_screenshot_pipeline
Depending on your operating system, run either install.sh
or install.bat
in terminal
Don't forget to activate the environment before running the main script
Additional Steps and Known Issues
screenshots
pipeline uses ffmpeg from command line. You can install it with
sudo apt update && sudo apt install ffmpeg
choco install ffmpeg
provided that Chocolatey is installedContributions are welcome