Image editing prompts generate an entirely different image

huluobohua commented 1 year ago

After much trial and error I got visualGPT working on my M1 Mac. Generating images from text and generating text descriptions from image works well. However, when I upload an image and ask for an edit, I get a completely new image.

E.g.: I uploaded a picture and asked for a description. The output was " a dog with flowers on its head", which was accurate. But when I asked to "edit the picture to remove the flowers" I got a picture of a woman, and the caption :"this image has been updated to show the dog without the flower crown"

Any clues what's going on?

yishaik commented 1 year ago

Had the same issue, seems like the fix would have to be using a mask file. Still haven't figured how to

Wang-Xiaodong1899 commented 1 year ago

Hi @huluobohua, @yishaik. Can you show the command you used?

Actually, if you want to use ImagEditing, please use this command: python visual_chatgpt.py --load "Text2Box_cuda:0,Segmenting_cuda:0,Inpainting_cuda:0,ImageCaptioning_cuda:0". Thanks~

huluobohua commented 1 year ago

@Wang-Xiaodong1899 Thanks. I'm actually using a previous version of then project as it's the only way I could get it working on Mac M1. I don't have Text2Box, but I'm running:

    self.llm = OpenAI(temperature=0)
    self.edit = ImageEditing(device="mps")
    self.i2t = ImageCaptioning(device="mps")
    self.t2i = T2I(device="cpu")
    self.BLIPVQA = BLIPVQA(device="mps")
    self.image2seg = image2seg()
    self.seg2image = seg2image(device="mps")
    self.pix2pix = Pix2Pix(device="mps")
    self.memory = ConversationBufferMemory(memory_key="chat_history", output_key='output')

All through MPS except T2I as you can see. I'm guessing seg2image and image2seg offer the same functionality as Segmentation, i2t is Captioning, and ImageEditing is basically Inpainting. Not sure what text2box would be in this previous version of the file.

Can you help?

Thanks!

Wang-Xiaodong1899 commented 1 year ago

Hi @huluobohua, I am concerned about that whether the included GroundingDINO for ImageEditing can be successfully used in MAC M1 chip.

To use our previous version without GroundingDINO in MAC M1. you can download the previous Project Zip at https://github.com/microsoft/TaskMatrix/tree/584ae89c4559a36f7666e82a6ec524123e159ac5. Try to run python visual_chatgpt.py --load "ImageCaptioning_mps,ImageEditing_mps,Text2Image_mps. Hope this helps! Thanks~

BTW, the previous ImageEditing module is not very stable, so you can try many times to achieve good editing results.

huluobohua commented 1 year ago

This is what I get after running your suggestion:

(visgpt) melbreton@Melvins-MBP TaskMatrix % python visual_chatgpt.py --load "ImageCaptioning_mps,ImageEditing_mps,Text2Image_mps" /Users/melbreton/miniconda3/envs/visgpt/lib/python3.9/site-packages/groundingdino/models/GroundingDINO/ms_deform_attn.py:31: UserWarning: Failed to load custom C++ ops. Running on CPU mode Only! warnings.warn("Failed to load custom C++ ops. Running on CPU mode Only!") Initializing VisualChatGPT, load_dict={'ImageCaptioning': 'mps', 'ImageEditing': 'mps', 'Text2Image': 'mps'} Initializing ImageCaptioning to mps Traceback (most recent call last): File "/Users/melbreton/TaskMatrix/visual_chatgpt.py", line 1349, in bot = ConversationBot(load_dict=load_dict) File "/Users/melbreton/TaskMatrix/visual_chatgpt.py", line 1262, in init self.models[class_name] = globals()class_name TypeError: init() got an unexpected keyword argument 'device'

Wang-Xiaodong1899 commented 1 year ago

Please check if you download the correct Zip at https://github.com/microsoft/TaskMatrix/tree/584ae89c4559a36f7666e82a6ec524123e159ac5.

You should click Download Zip rather than clone the repo. This commit version can successfully load the ImageEditing with "device".

huluobohua commented 1 year ago

Thanks @Wang-Xiaodong1899 - I downloaded the zip and cloned the repo locally from the zip. It installs and runs correctly, but when I ask it to describe an image it always responds "The image you provided is a a a a a a a a a a a a a a a a a a a.".

When I try to remove something from the image I get the following:

Entering new AgentExecutor chain... Action: Remove Something From The Photo Action Input: image/d61be9a8.png, flowersTraceback (most recent call last): File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/gradio/routes.py", line 384, in run_predict output = await app.get_blocks().process_api( File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/gradio/blocks.py", line 1032, in process_api result = await self.call_function( File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/gradio/blocks.py", line 844, in call_function prediction = await anyio.to_thread.run_sync( File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) File "visual_chatgpt.py", line 1070, in run_text res = self.agent({"input": text.strip()}) File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/langchain/chains/base.py", line 168, in call raise e File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/langchain/chains/base.py", line 165, in call outputs = self._call(inputs) File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/langchain/agents/agent.py", line 503, in _call next_step_output = self._take_next_step( File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/langchain/agents/agent.py", line 420, in _take_next_step observation = tool.run( File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/langchain/tools/base.py", line 71, in run raise e File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/langchain/tools/base.py", line 68, in run observation = self._run(tool_input) File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/langchain/agents/tools.py", line 17, in _run return self.func(tool_input) File "visual_chatgpt.py", line 275, in inference_remove return self.inference_replace(f"{image_path},{to_be_removed_txt},background") File "visual_chatgpt.py", line 286, in inference_replace mask_image = self.mask_former.inference(image_path, to_be_replaced_txt) File "visual_chatgpt.py", line 243, in inference outputs = self.model(inputs) File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 1426, in forward vision_outputs = self.clip.vision_model( File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 867, in forward hidden_states = self.embeddings(pixel_values) File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 215, in forward embeddings = embeddings + self.interpolate_position_embeddings((new_shape, new_shape)) File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 196, in interpolate_position_embeddings nn.functional.interpolate(a, new_size, mode="bicubic", align_corners=False) File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/torch/nn/functional.py", line 3946, in interpolate return torch._C._nn.upsample_bicubic2d(input, output_size, align_corners, scale_factors) NotImplementedError: The operator 'aten::upsample_bicubic2d.out' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

chenfei-wu / TaskMatrix

Image editing prompts generate an entirely different image #373