This is the official implementation of Glyph-ByT5 and Glyph-ByT5-v2, introduced in Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering and Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering .
⛽⛽⛽ Contact: yuhui.yuan@microsoft.com
2024.06.28 We have removed the weights and code that may have used potentially unauthorized datasets in the current stage. We will update the checkpoints after the Microsoft RAI process.
We identify two crucial requirements of text encoders for achieving accurate visual text rendering: character awareness and alignment with glyphs. To this end, we propose a customized text encoder, Glyph-ByT5, by fine-tuning the character-aware ByT5 encoder using a meticulously curated paired glyph-text dataset.
We present an effective method for integrating Glyph-ByT5 with SDXL, resulting in the creation of the Glyph-SDXL model for design image generation. This significantly enhances text rendering accuracy, improving it from less than 20% to nearly 90% on our design image benchmark. Noteworthy is Glyph-SDXL's newfound ability for text paragraph rendering, achieving high spelling accuracy for tens to hundreds of characters with automated multi-line layouts.
We deliver a powerful customized multilingual text encoder, Glyph-ByT5-v2, and a strong aesthetic graphic generation model, Glyph-SDXL-v2, that can support accurate spelling in $\sim10$ different languages
For a detailed guide on Glyph-SDXL and Glyph-SDXL-v2 inference, see this folder.
For a detailed guide on Glyph-ByT5 alignment pretraining, see this folder.
If you find this code useful in your research, please consider citing:
@article{liu2024glyph,
title={Glyph-byt5: A customized text encoder for accurate visual text rendering},
author={Liu, Zeyu and Liang, Weicong and Liang, Zhanhao and Luo, Chong and Li, Ji and Huang, Gao and Yuan, Yuhui},
journal={arXiv preprint arXiv:2403.09622},
year={2024}
}
and
@article{liu2024glyphv2,
title={Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering},
author={Liu, Zeyu and Liang, Weicong and Zhao, Yiming and Chen, Bohan and Li, Ji and Yuan, Yuhui},
journal={arXiv preprint arXiv:2406.10208},
year={2024}
}