dailenson/One-DM - Githubissues

One-DM:One-Shot Diffusion Mimicker for Handwritten Text Generation

🌟 Introduction

We propose a One-shot Diffusion Mimicker (One-DM) for stylized handwritten text generation, which only requires a single reference sample as style input, and imitates its writing style to generate handwritten text with arbitrary content.
Previous state-of-the-art methods struggle to accurately extract a user's handwriting style from a single sample due to their limited ability to learn styles. To address this issue, we introduce the high-frequency components of the reference sample to enhance the extraction of handwriting style. The proposed style-enhanced module can effectively capture the writing style patterns and suppress the interference of background noise.
Extensive experiments on handwriting datasets in English, Chinese, and Japanese demonstrate that our approach with a single style reference even outperforms previous methods with 15x-more references.

Overview of the proposed One-DM

🌠 Release

[2024/10/24] We have provided a well-trained One-DM checkpoint on Google Drive and Baidu Drive :)
[2024/09/16] This work is reported by Synced (机器之心).
[2024/09/07]🔥🔥🔥 We open-source the first version of One-DM that can generate the handwritten words. (Later versions supporting Chinese and Japanese will be released soon.)

🔨 Requirements

conda create -n One-DM python=3.8 -y
conda activate One-DM
# install all dependencies
conda env create -f environment.yml

☀️ Datasets

We provide English datasets in Google Drive | Baidu Netdisk | ShiZhi AI. Please download these datasets, uzip them and move the extracted files to /data.

🐳 Model Zoo

Model	Google Drive	Baidu Netdisk	ShiZhi AI
Pretrained One-DM	Google Drive	Baidu Netdisk	ShiZhi AI
Pretrained OCR model	Google Drive	Baidu Netdisk	ShiZhi AI
Pretrained Resnet18	Google Drive	Baidu Netdisk	ShiZhi AI

Note: Please download these weights, and move them to /model_zoo. (If you cannot access the pre-trained VAE model available on Hugging Face, please refer to the pinned issue for guidance.)

🏋️ Training & Test

training on English dataset

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=2 train.py \
--feat_model model_zoo/RN18_class_10400.pth \
--log English

finetune on English dataset

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 train_finetune.py \
--one_dm ./Saved/IAM64_scratch/English-timestamp/model/epoch-ckpt.pt \
--ocr_model ./model_zoo/vae_HTR138.pth --log English

Note: Please modify timestamp and epoch according to your own path.

test on English dataset

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 test.py \
--one_dm ./Saved/IAM64_finetune/English-timestamp/model/epoch-ckpt.pt \
--generate_type oov_u --dir ./Generated/English

Note: Please modify timestamp and epoch according to your own path.

📺 Exhibition

Comparisons with industrial image generation methods on handwritten text generation
Comparisons with industrial image generation methods on Chinese handwriting generation
English handwritten text generation
Chinese and Japanese handwriting generation

❤️ Citation

If you find our work inspiring or use our codebase in your research, please cite our work:

@inproceedings{one-dm2024,
  title={One-Shot Diffusion Mimicker for Handwritten Text Generation},
  author={Dai, Gang and Zhang, Yifan and Ke, Quhui and Guo, Qiangya and Huang, Shuangping},
  booktitle={European Conference on Computer Vision},
  year={2024}
}

dailenson / One-DM

readme