Paper | Authors: Yingchaojie Feng, Xingbo Wang, Kam Kwai Wong, Sijia Wang, Yuhong Lu, Minfeng Zhu, Baicheng Wang, Wei Chen
Generative text-to-image models have gained great popularity among the public for their powerful capability to generate high-quality images based on natural language prompts. However, developing effective prompts for desired images can be challenging due to the complexity and ambiguity of natural language. This research proposes PromptMagician, a visual analysis system that helps users explore the image results and refine the input prompts. The backbone of our system is a prompt recommendation model that takes user prompts as input, retrieves similar prompt-image pairs from DiffusionDB, and identifies special (important and relevant) prompt keywords. To facilitate interactive prompt refinement, PromptMagician introduces a multi-level visualization for the cross-modal embedding of the retrieved images and recommended keywords, and supports users in specifying multiple criteria for personalized exploration. Two usage scenarios, a user study, and expert interviews demonstrate the effectiveness and usability of our system, suggesting it facilitates prompt engineering and improves the creativity support of the generative text-to-image model.
The environment setups include frontend (react 18.2.0, d3 7.8.2), and backend (python 3.7 or above).
cd back-end
pip install -r requirements.txt
python /diffusionDB/download.py
Download pre-processed data (for DiffusionDB 2m_first_100k and GPU environments) and move the folds to back-end/.cache directory. You can also create your own version by referring to the workflow.py.
set up backend (configure config.py and run_sd.sh first, we use 8 GPUs by default).
cd server
sh run_sd.sh
python server.py
cd front-end
npm install
npm start
If this paper and tool helps your research projects, please considering citing our paper:
@article{feng2023promptmagician,
title={PromptMagician: Interactive Prompt Engineering for Text-to-Image Creation},
author={Feng, Yingchaojie and Wang, Xingbo and Wong, Kam Kwai and Wang, Sijia and Lu, Yuhong and Zhu, Minfeng and Wang, Baicheng and Chen, Wei},
journal={IEEE Transactions on Visualization and Computer Graphics},
volume={30},
number={1},
pages={295--305},
year={2024},
doi={10.1109/TVCG.2023.3327168}
}