IDEA-Research / Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
https://arxiv.org/abs/2401.14159
Apache License 2.0
14.85k stars 1.37k forks source link

inpainting on python gradio_app.py error #44

Closed yoke233 closed 1 year ago

yoke233 commented 1 year ago
final text_encoder_type: bert-base-uncased
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Model loaded from /home/yoke/.cache/huggingface/hub/models--ShilongLiu--GroundingDINO/snapshots/6fb3434d67548d71747b1ab3a32051d27a30c71f/groundingdino_swint_ogc.pth 
 => _IncompatibleKeys(missing_keys=[], unexpected_keys=['label_enc.weight'])
/data/yoke/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/modeling_utils.py:830: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
/data/yoke/anaconda3/envs/py310/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
  warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
text_encoder/model.safetensors not found
Fetching 16 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 5680.93it/s]
/data/yoke/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/models/clip/feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.
  warnings.warn(
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
  0%|                                                                                                                        | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/data/yoke/anaconda3/envs/py310/lib/python3.10/site-packages/gradio/routes.py", line 394, in run_predict
    output = await app.get_blocks().process_api(
  File "/data/yoke/anaconda3/envs/py310/lib/python3.10/site-packages/gradio/blocks.py", line 1075, in process_api
    result = await self.call_function(
  File "/data/yoke/anaconda3/envs/py310/lib/python3.10/site-packages/gradio/blocks.py", line 884, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/data/yoke/anaconda3/envs/py310/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/data/yoke/anaconda3/envs/py310/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/data/yoke/anaconda3/envs/py310/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/data/yoke/Grounded-Segment-Anything/gradio_app.py", line 254, in run_grounded_sam
    image = pipe(prompt=inpaint_prompt, image=image_pil, mask_image=mask_pil).images[0]
  File "/data/yoke/anaconda3/envs/py310/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/data/yoke/anaconda3/envs/py310/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py", line 854, in __call__
    latent_model_input = torch.cat([latent_model_input, mask, masked_image_latents], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 64 but got size 109 for tensor number 2 in the list.

image

yoke233 commented 1 year ago

The image should be a multiple of 64 in height and width of a square