chenfei-wu / TaskMatrix

Other
34.51k stars 3.32k forks source link

RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1, 256, 256] because the unspecified dimension size -1 can be any value and is ambiguous #391

Open SunTongtongtong opened 1 year ago

SunTongtongtong commented 1 year ago

Hello there,

I try to generate image with my uploaded image and modified it with text input. I load as #373 suggested: python visual_chatgpt.py --load "Text2Box_cuda:1,Segmenting_cuda:1,Inpainting_cuda:0,ImageCaptioning_cuda:0" However I receive the bug as titled.

When I load all the models as default in readme:

python visual_chatgpt.py --load "Text2Box_cuda:0,Segmenting_cuda:0,
    Inpainting_cuda:0,ImageCaptioning_cuda:1,
    Text2Image_cuda:1,Image2Canny_cpu,CannyText2Image_cuda:1,
    Image2Depth_cpu,DepthText2Image_cuda:1,VisualQuestionAnswering_cuda:1,
    InstructPix2Pix_cuda:0,Image2Scribble_cpu,ScribbleText2Image_cuda:1,
    SegText2Image_cuda:0,Image2Pose_cpu,PoseText2Image_cuda:0,
    Image2Hed_cpu,HedText2Image_cuda:1,Image2Normal_cpu,
    NormalText2Image_cuda:1"

The server can run but generate totally unsimilar images.

Can anyone help?

SunTongtongtong commented 1 year ago

======>Auto Resize Image... Resize image form 493x512 to 512x512 /import/sgg-homes/ss014/software/anaconda3/envs/visgpt_2/lib/python3.8/site-packages/transformers/generation/utils.py:1313: UserWarning: Using max_length's default (20) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using max_new_tokens to control the maximum length of the generation. warnings.warn(

Processed ImageCaptioning, Input Image: image/aac0c16e.png, Output Text: a puppy sitting in a truck with hay

Processed run_image, Input image: image/aac0c16e.png Current state: [('image/aac0c16e.png', 'Received. ')] Current Memory: Human: provide a figure named image/aac0c16e.png. The description is: a puppy sitting in a truck with hay. This information helps you to understand this image, but you should use tools to finish following tasks, rather than directly imagine from my description. If you understand, say "Received". AI: Received. history_memory: Human: provide a figure named image/aac0c16e.png. The description is: a puppy sitting in a truck with hay. This information helps you to understand this image, but you should use tools to finish following tasks, rather than directly imagine from my description. If you understand, say "Received". AI: Received. , n_tokens: 48

Entering new AgentExecutor chain... Action: Replace Something From The Photo Action Input: image/aac0c16e.png, puppy, small dogimage_path=image/aac0c16e.png, to_be_replaced_txt= puppy /import/sgg-homes/ss014/software/anaconda3/envs/visgpt_2/lib/python3.8/site-packages/transformers/modeling_utils.py:862: FutureWarning: The device argument is deprecated and will be removed in v5 of Transformers. warnings.warn( /import/sgg-homes/ss014/software/anaconda3/envs/visgpt_2/lib/python3.8/site-packages/torch/utils/checkpoint.py:25: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn("None of the inputs have requires_grad=True. Gradients will be None") Traceback (most recent call last): File "/import/sgg-homes/ss014/software/anaconda3/envs/visgpt_2/lib/python3.8/site-packages/gradio/routes.py", line 412, in run_predict output = await app.get_blocks().process_api( File "/import/sgg-homes/ss014/software/anaconda3/envs/visgpt_2/lib/python3.8/site-packages/gradio/blocks.py", line 1299, in process_api result = await self.call_function( File "/import/sgg-homes/ss014/software/anaconda3/envs/visgpt_2/lib/python3.8/site-packages/gradio/blocks.py", line 1021, in call_function prediction = await anyio.to_thread.run_sync( File "/import/sgg-homes/ss014/software/anaconda3/envs/visgpt_2/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/import/sgg-homes/ss014/software/anaconda3/envs/visgpt_2/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/import/sgg-homes/ss014/software/anaconda3/envs/visgpt_2/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, args) File "visual_chatgpt.py", line 1307, in run_text res = self.agent({"input": text.strip()}) File "/import/sgg-homes/ss014/software/anaconda3/envs/visgpt_2/lib/python3.8/site-packages/langchain/chains/base.py", line 168, in call raise e File "/import/sgg-homes/ss014/software/anaconda3/envs/visgpt_2/lib/python3.8/site-packages/langchain/chains/base.py", line 165, in call outputs = self._call(inputs) File "/import/sgg-homes/ss014/software/anaconda3/envs/visgpt_2/lib/python3.8/site-packages/langchain/agents/agent.py", line 503, in _call next_step_output = self._take_next_step( File "/import/sgg-homes/ss014/software/anaconda3/envs/visgpt_2/lib/python3.8/site-packages/langchain/agents/agent.py", line 420, in _take_next_step observation = tool.run( File "/import/sgg-homes/ss014/software/anaconda3/envs/visgpt_2/lib/python3.8/site-packages/langchain/tools/base.py", line 71, in run raise e File "/import/sgg-homes/ss014/software/anaconda3/envs/visgpt_2/lib/python3.8/site-packages/langchain/tools/base.py", line 68, in run observation = self._run(tool_input) File "/import/sgg-homes/ss014/software/anaconda3/envs/visgpt_2/lib/python3.8/site-packages/langchain/agents/tools.py", line 17, in _run return self.func(tool_input) File "visual_chatgpt.py", line 1233, in inference_replace_sam masks = self.sam.get_mask_with_boxes(image_pil, image, boxes_filt) File "visual_chatgpt.py", line 841, in get_mask_withboxes masks, , _ = self.sam_predictor.predict_torch( File "/import/sgg-homes/ss014/software/anaconda3/envs/visgpt_2/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(args, *kwargs) File "/import/sgg-homes/ss014/software/anaconda3/envs/visgpt_2/lib/python3.8/site-packages/segment_anything/predictor.py", line 229, in predict_torch low_res_masks, iou_predictions = self.model.mask_decoder( File "/import/sgg-homes/ss014/software/anaconda3/envs/visgpt_2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, *kwargs) File "/import/sgg-homes/ss014/software/anaconda3/envs/visgpt_2/lib/python3.8/site-packages/segment_anything/modeling/mask_decoder.py", line 94, in forward masks, iou_pred = self.predict_masks( File "/import/sgg-homes/ss014/software/anaconda3/envs/visgpt_2/lib/python3.8/site-packages/segment_anything/modeling/mask_decoder.py", line 144, in predict_masks masks = (hyper_in @ upscaled_embedding.view(b, c, h w)).view(b, -1, h, w) RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1, 256, 256] because the unspecified dimension size -1 can be any value and is ambiguous

SunTongtongtong commented 1 year ago

I find out the reason: the groundingDINO object detector cannot work in case of loading Text2Box_cuda on other cuda card. I change to load on 0, it works. Don't understand why this happens though.

damengdameng commented 1 year ago

I find out the reason: the groundingDINO object detector cannot work in case of loading Text2Box_cuda on other cuda card. I change to load on 0, it works. Don't understand why this happens though.

I met the same issue. When I load the Text2Box on cuda:2, there also a process will be created on cuda:0. I don't know if part of Text2Box has to run on cuda:0.

diyifirstone commented 1 year ago

please tell me how to deal with this problem.