Open huluobohua opened 1 year ago
Had the same issue, seems like the fix would have to be using a mask file. Still haven't figured how to
Hi @huluobohua, @yishaik. Can you show the command you used?
Actually, if you want to use ImagEditing
, please use this command: python visual_chatgpt.py --load "Text2Box_cuda:0,Segmenting_cuda:0,Inpainting_cuda:0,ImageCaptioning_cuda:0"
. Thanks~
@Wang-Xiaodong1899 Thanks. I'm actually using a previous version of then project as it's the only way I could get it working on Mac M1. I don't have Text2Box, but I'm running:
self.llm = OpenAI(temperature=0)
self.edit = ImageEditing(device="mps")
self.i2t = ImageCaptioning(device="mps")
self.t2i = T2I(device="cpu")
self.BLIPVQA = BLIPVQA(device="mps")
self.image2seg = image2seg()
self.seg2image = seg2image(device="mps")
self.pix2pix = Pix2Pix(device="mps")
self.memory = ConversationBufferMemory(memory_key="chat_history", output_key='output')
All through MPS except T2I as you can see. I'm guessing seg2image and image2seg offer the same functionality as Segmentation, i2t is Captioning, and ImageEditing is basically Inpainting. Not sure what text2box would be in this previous version of the file.
Can you help?
Thanks!
Hi @huluobohua, I am concerned about that whether the included GroundingDINO
for ImageEditing
can be successfully used in MAC M1 chip.
To use our previous version without GroundingDINO
in MAC M1. you can download the previous Project Zip at https://github.com/microsoft/TaskMatrix/tree/584ae89c4559a36f7666e82a6ec524123e159ac5. Try to run python visual_chatgpt.py --load "ImageCaptioning_mps,ImageEditing_mps,Text2Image_mps
. Hope this helps! Thanks~
BTW, the previous ImageEditing
module is not very stable, so you can try many times to achieve good editing results.
This is what I get after running your suggestion:
(visgpt) melbreton@Melvins-MBP TaskMatrix % python visual_chatgpt.py --load "ImageCaptioning_mps,ImageEditing_mps,Text2Image_mps"
/Users/melbreton/miniconda3/envs/visgpt/lib/python3.9/site-packages/groundingdino/models/GroundingDINO/ms_deform_attn.py:31: UserWarning: Failed to load custom C++ ops. Running on CPU mode Only!
warnings.warn("Failed to load custom C++ ops. Running on CPU mode Only!")
Initializing VisualChatGPT, load_dict={'ImageCaptioning': 'mps', 'ImageEditing': 'mps', 'Text2Image': 'mps'}
Initializing ImageCaptioning to mps
Traceback (most recent call last):
File "/Users/melbreton/TaskMatrix/visual_chatgpt.py", line 1349, in
Please check if you download the correct Zip at https://github.com/microsoft/TaskMatrix/tree/584ae89c4559a36f7666e82a6ec524123e159ac5.
You should click Download Zip rather than clone the repo. This commit version can successfully load the ImageEditing
with "device".
Thanks @Wang-Xiaodong1899 - I downloaded the zip and cloned the repo locally from the zip. It installs and runs correctly, but when I ask it to describe an image it always responds "The image you provided is a a a a a a a a a a a a a a a a a a a.".
When I try to remove something from the image I get the following:
Entering new AgentExecutor chain...
Action: Remove Something From The Photo
Action Input: image/d61be9a8.png, flowersTraceback (most recent call last):
File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/gradio/routes.py", line 384, in run_predict
output = await app.get_blocks().process_api(
File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/gradio/blocks.py", line 1032, in process_api
result = await self.call_function(
File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/gradio/blocks.py", line 844, in call_function
prediction = await anyio.to_thread.run_sync(
File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "visual_chatgpt.py", line 1070, in run_text
res = self.agent({"input": text.strip()})
File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/langchain/chains/base.py", line 168, in call
raise e
File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/langchain/chains/base.py", line 165, in call
outputs = self._call(inputs)
File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/langchain/agents/agent.py", line 503, in _call
next_step_output = self._take_next_step(
File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/langchain/agents/agent.py", line 420, in _take_next_step
observation = tool.run(
File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/langchain/tools/base.py", line 71, in run
raise e
File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/langchain/tools/base.py", line 68, in run
observation = self._run(tool_input)
File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/langchain/agents/tools.py", line 17, in _run
return self.func(tool_input)
File "visual_chatgpt.py", line 275, in inference_remove
return self.inference_replace(f"{image_path},{to_be_removed_txt},background")
File "visual_chatgpt.py", line 286, in inference_replace
mask_image = self.mask_former.inference(image_path, to_be_replaced_txt)
File "visual_chatgpt.py", line 243, in inference
outputs = self.model(inputs)
File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, *kwargs)
File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 1426, in forward
vision_outputs = self.clip.vision_model(
File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(input, kwargs)
File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 867, in forward
hidden_states = self.embeddings(pixel_values)
File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 215, in forward
embeddings = embeddings + self.interpolate_position_embeddings((new_shape, new_shape))
File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 196, in interpolate_position_embeddings
nn.functional.interpolate(a, new_size, mode="bicubic", align_corners=False)
File "/Users/melbreton/miniconda3/envs/tm2/lib/python3.8/site-packages/torch/nn/functional.py", line 3946, in interpolate
return torch._C._nn.upsample_bicubic2d(input, output_size, align_corners, scale_factors)
NotImplementedError: The operator 'aten::upsample_bicubic2d.out' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1
to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
After much trial and error I got visualGPT working on my M1 Mac. Generating images from text and generating text descriptions from image works well. However, when I upload an image and ask for an edit, I get a completely new image.
E.g.: I uploaded a picture and asked for a description. The output was " a dog with flowers on its head", which was accurate. But when I asked to "edit the picture to remove the flowers" I got a picture of a woman, and the caption :"this image has been updated to show the dog without the flower crown"
Any clues what's going on?