is DragDiffusion suitable for black-box image editing?

Dear Mr Shi: Nice to meet you! I just read your code and am very interested. Now I have a question and need your help: In my case, I have a target image and a black-box online image-generative service, which is also based on a diffusion model and takes both an image and a text prompt as input, then outputs the generated image. Now, my goal is to find the optimal unknown input image and text prompt to make the generated output image most similar to my target image using cosine and SSIM similarity. I wonder if DDIM inversion or null-text inversion is suitable for this black-box case or if they are just for white-box open-source cases. Could you please give me some advice on how to achieve this goal? (Besides a gradient descent method, which needs too many iterations.) Thank you very much!

Yujun-Shi / DragDiffusion

is DragDiffusion suitable for black-box image editing? #64