Project-MONAI / GenerativeModels

MONAI Generative Models makes it easy to train, evaluate, and deploy generative models and related applications
Apache License 2.0
598 stars 86 forks source link

I have a question about the prompt when generating a chest image. #456

Closed dlsgurdlfkd closed 6 months ago

dlsgurdlfkd commented 9 months ago

Thanks for the great research!!

“Brain imaging generation with latent diffusion models.” According to the paper, when the diffusion model generates brain images, it receives conditioning variables(age, sex, ventricular volume, brain volume) and generates them.

On the other hand, when generating a chest image, how do you convert prompts into conditioning variables?

Looking at inference.json, it seems to use pre-trained CLIPTokenizer and CLIPTextModel.

When generating a chest image, if I enter the prompt as, for example, "Big right-sided pleural effusion", will the pre-trained CLIPTokenizer and CLIPTextModel convert the prompt into conditioning variables?

Is it correct to use conditioning variables? If it is correct to use conditioning variables, can I know what conditioning variables are used when generating chest images?

Thank you.

marksgraham commented 9 months ago

Hi there,

the tokenizer will not convert the prompt into human-interpretable conditioning variables (e.g. age, volume) - just into an embedding space that is meaningful to the model itself, but not to you! So the only way to control the generation in a human-interpretable way is using text inputs.